Intelligent Reindexing on Filer based volumes

Problem

A reindexing job in our billing system DB caused an outage on the SQL server, by filling up the snapshot space for the db_log volume on out NetApp FIler. This happened due to the indexes being very highly fragmented causing a huge amount of log churn when rebuilding.

Reindexing in SQL Server is necessary in defragmenting indexes, which over time have become fragmented. Learn more here

http://sql-articles.com/articles/performance-tunning/index-fragmentation-in-sql-server-2005/

https://dbamohsin.wordpress.com/2011/05/20/fillfactor-of-an-index-slow-down-fragmentation/

I’ve been working on re-engineering the reindexing process so that it can consider the churn and limit the work done based on churn.

Solution

The new Reindexing process is as follows:

Every night at 23:00 a job runs called “Maintenance – Populate Fragmentation” runs on the SQL Server. This populates a table called TMG_Index_Fragmentation in the Billing database which is used in the job detailed below. This job records all fragmentation in the database.

Every Tuesday & Sunday at 01:00 a job called “Intelligent Reindex/Reorganise” runs on the SQL server. This does the Reindexing for the billing system DB based on the following criteria:

Only Reindex while the churn created by the reindexing is less that 5gb – if the upper limit is reached then reindexing stops

  • Reindex if above 25% fragmented
  • Reorganise if between 10% and 25% fragmented
  • Reindex online if QUERY_GOVERNER_COST_LIMIT is more than 30 seconds (Discussed here – https://dbamohsin.wordpress.com/2011/02/08/query-control-with-query_governer_cost_limit/). We have a table which stores these objects (TMG_High_Cost_Objects)
  • Skip objects at index level if churn limit is likely to be broken by reindexing (calculated by – current churn + (index reserved space * 1.10) < 5120)
    The Procedure which does the reindexing is detailed below. This procedure can potentially work on any database (as long as the objects needed have been created)

See attached log for what you would expect to see after reindexing job has completed (Intelligent Reindex-Reorganise.log).

A summary is also produced for the workload completed:

############ SUMMARY OF WORKLOAD ############## [SQLSTATE 01000]
Sum of Index Size Rebuilt: 4408 MB [SQLSTATE 01000]
Log Size on Completion: 3074 MB [SQLSTATE 01000]
Number of Indexes Rebuilt online: 10 [SQLSTATE 01000]
Number of Indexes Rebuilt offline: 76 [SQLSTATE 01000]
Number of Indexes Reorganized: 9 [SQLSTATE 01000]
Number of Indexes Skipped – Size Quota: 593 [SQLSTATE 01000]
Number of Indexes Skipped – No Fragmentation: 0 [SQLSTATE 01000]

DECLARE @Table_Name varchar(255)
DECLARE @Index_Name varchar(255)
DECLARE @Reserved int
DECLARE @Used int
DECLARE @Fragmentation int
DECLARE @Page_Count int
DECLARE @Fragment_Count int
DECLARE @SumMB int
DECLARE @onlinerebuildcount int
DECLARE @offlinerebuildcount int
DECLARE @reorgcount int
DECLARE @maxreindexsize int
DECLARE @SkipCount int
DECLARE @SkipCountMB int
DECLARE @UsedMB int

-- Reset Variables used for Summary Report
SET @SumMB = 0
SET @onlinerebuildcount = 0
SET @offlinerebuildcount = 0
SET @reorgcount = 0
SET @SkipCountMB = 0
SET @SkipCount = 0

-- Maximum total reindex size in MB. This dictates how many indexes are picked up for rebuild
SET @maxreindexsize = 5120

DECLARE TableCursor CURSOR FOR
    SELECT 
        OBJECT_NAME(i.[object_id]) as [Table_Name],
        i.name as [Index_Name], 
        (a.total_pages * 8) / 1024 as [Reserved],
        (a.used_pages * 8)/ 1024 as [Used],
        fr.avg_fragmentation_in_percent as [Fragmentaiton], 
        fr.page_count as [Page_Count], 
        fr.fragment_count as [Fragment_Count]
    FROM sys.indexes as i
        inner join tmg_index_fragmentation fr on ((fr.object_id = i.object_id) and (fr.index_id = i.index_id))
        inner join sys.partitions as p on i.object_id = p.object_id and i.index_id = p.index_id 
        inner join sys.allocation_units as a on p.partition_id = a.container_id
-- Rebuilding re-enables an Index therefore discount disabled indexes
    WHERE i.is_disabled = 0
    ORDER BY fr.avg_fragmentation_in_percent DESC

OPEN TableCursor
FETCH NEXT FROM TableCursor INTO @Table_Name, @Index_Name, @Reserved, @Used, @Fragmentation, @Page_Count, @Fragment_Count
WHILE @@FETCH_STATUS = 0
BEGIN
    
SELECT @UsedMB = 
SUM(CONVERT(NUMERIC(10,2),ROUND(FILEPROPERTY([name],'SpaceUsed')/128.,2)))
FROM [sys].[database_files]
Where Type_desc = 'LOG'

    IF (@UsedMB + (@Reserved*1.2)) < @maxreindexsize -- Max churn per reindexing window. Add 20% Leeway to sizing
    BEGIN
        IF @Fragmentation > 25    
        BEGIN
            IF @Table_Name in (Select name from TMG_High_Cost_Objects where reindex_mode = 'ONLINE')
            BEGIN
                PRINT 'REINDEX ONLINE STARTED - '+ @Index_Name + ' ON ' + @Table_Name + ' AT ' + convert(varchar(20),getdate(),113)
                EXEC ('ALTER INDEX [' + @Index_Name + '] ON ' + @Table_Name + ' REBUILD WITH (SORT_IN_TEMPDB=ON, ONLINE=ON);');
                SET @onlinerebuildcount = @onlinerebuildcount + 1
                PRINT 'REINDEX ONLINE COMPLETED - '+ @Index_Name + ' ON ' + @Table_Name + ' AT ' + convert(varchar(20),getdate(),113)
            END 
            ELSE
            BEGIN
                PRINT 'REINDEX OFFLINE STARTED - '+ @Index_Name + ' ON ' + @Table_Name + ' AT ' + convert(varchar(20),getdate(),113)
                EXEC ('ALTER INDEX [' + @Index_Name + '] ON ' + @Table_Name + ' REBUILD WITH (SORT_IN_TEMPDB=ON);');
                SET @offlinerebuildcount = @offlinerebuildcount + 1
                PRINT 'REINDEX OFFLINE COMPLETED - '+ @Index_Name + ' ON ' + @Table_Name + ' AT ' + convert(varchar(20),getdate(),113)
            END
        
        END
        ELSE IF @Fragmentation > 10 and @Fragmentation <= 25
        BEGIN
            PRINT 'REORGANIZE STARTED - '+ @Index_Name + ' ON ' + @Table_Name + ' AT ' + convert(varchar(20),getdate(),113)    
            EXEC ('ALTER INDEX [' + @Index_Name + '] ON ' + @Table_Name + ' REORGANIZE;');
            SET @reorgcount = @reorgcount + 1
            PRINT 'REORGANIZE COMPLETED - '+ @Index_Name + ' ON ' + @Table_Name + ' AT ' + convert(varchar(20),getdate(),113)
        END
        ELSE
        BEGIN
            PRINT 'OBJECT SKIPPED - '+ @Index_Name + ' ON ' + @Table_Name + '. Minimal Fragmentation (' + CAST(@Fragmentation as varchar(5)) + '%)'
            SET @SkipCount = @SkipCount + 1 
        END
    SET @SumMB = @SumMB + @Reserved
    END
    ELSE
    BEGIN
        PRINT 'OBJECT SKIPPED - '+ @Index_Name + ' ON ' + @Table_Name + '. Failed Size Check (' + CAST(@Reserved as varchar(5)) + ' MB)'
        SET @SkipCountMB = @SkipCountMB + 1 
    END

FETCH NEXT FROM TableCursor INTO @Table_Name, @Index_Name, @Reserved, @Used, @Fragmentation, @Page_Count, @Fragment_Count
END

PRINT '############ SUMMARY OF WORKLOAD ##############'
PRINT 'Sum of Index Size Rebuilt: ' + CAST(@SumMB as varchar(6)) + ' MB'
PRINT 'Log Size on Completion: ' + CAST(@UsedMB as varchar(6)) + ' MB'
PRINT 'Number of Indexes Rebuilt online: ' + CAST(@onlinerebuildcount as varchar(6))
PRINT 'Number of Indexes Rebuilt offline: ' + CAST(@offlinerebuildcount as varchar(6))
PRINT 'Number of Indexes Reorganized: ' + CAST(@reorgcount as varchar(6))
PRINT 'Number of Indexes Skipped - Size Quota: ' + CAST(@SkipCountMB as varchar(6))
PRINT 'Number of Indexes Skipped - No Fragmentation: ' + CAST(@SkipCount as varchar(6))

CLOSE TableCursor

DEALLOCATE TableCursor

return
GO

One Response to Intelligent Reindexing on Filer based volumes

  1. danieladeniji says:

    Moshin:

    Thanks for putting so much work in the Public Domain.

    Couple of follow-ups:

    Would not be prudent to disable NetApp Snapshots on clearly marked Log Volumes. Assuming you have clearly defined NetApp Volumes and LUNS. And, OS Physical and Logical Disks. Furthermore, that your Log files are isolated on these Log Disks.

    It seems that you might be able to disabled NetApp Snapshot:

    snap schedule 0 0 0
    snap reserve 0

    Unless, actual Log files and not transaction backup files are a big part of your database recovery strategy.

    Once again, thanks for endlessly sharing so precisely.

Leave a reply to danieladeniji Cancel reply