- Create the new table with the clustered (and non clustered) indexes you want.
- Create a control table to maintain high-low watermarks. Set the high water mark to the maximum ascending key value in the live table.
- Perform a bulk load (SSIS Data Flow task) for all the live table rows up to and including the high water mark. This task might take several hours.
- Set the low water mark to the high water mark, then re set the high water mark to the new maximum ascending key value in the live table.
- Perform the bulk load again. This time it will just be the rows that have been inserted during our earlier bulk load. After this task, the new table will be very close, but not quite, up to date to the live table.
- Within the scope of a transaction lock the live table, insert records above the high water mark, rename the tables (and PKs) and commit the transaction. The new table is now live.
A few notes on the process
- You will need to have sufficient space in your database to accommodate two copies of the table.
- You will need to have sufficient space in your database logs to hold log records from the start of the Main Bulk Load to the end of the Main Bulk Load. So, if the bulk load takes 4 hours, their will be a transaction open for 4 hours and a log truncation won't truncate anything within the 4 hours.
- You will need to have sufficient space in tempdb to perform the sorts for the clustered index and nonclustered indexes.
- If you have sufficient space in the database files and logs, you should set the maximum commit size on the destination to 0 (zero). Having other values will tend to slow down the load process (although less disk space is required) since the indexes will be maintained at each commit.
- It is quite likely that your large tables are partitioned and compressed. This doesn't need to make any difference to the reclustering process above. However, if the table is already partitioned, you could improve performance by looping through each partition. If your table isn't already partitioned, this process could be the opportunity to partition your table.
- Don't be tempted to use NOLOCK when reading the live table.
- Test your package before you run it in production.
There is no special code in the package I have created. The most significant task is the very last one, which is performed within the scope of a transaction. To ensure there is no opportunity of a deadlock, the task first asks for an X lock on the live table. Here is some sample SQL from each of the tasks.
Create Control table
create table ReclusteringControl( ReclusteringControlID int default 1, LowWater bigint not null default 0, HighWater bigint not null default 0, CONSTRAINT [PK_ReclusteringControl] PRIMARY KEY NONCLUSTERED (ReclusteringControlID ASC))
Insert into ReclusteringControl values(1,0,0)
update ReclusteringControl set HighWater=(select max(rid) from MyLargeTable)
Input for Main Bulk Load
select * from MyLargeTablewhere rid<=(select HighWater from ReclusteringControl)
Set New HighWatermark
update ReclusteringControl set LowWater=HighWater
update ReclusteringControl set HighWater=(select max(rid) from MyLargeTable)
Second Bulk Load (input statement)
select * from MyLargeTablewhere rid<=(select HighWater from ReclusteringControl) and rid>(select LowWater from ReclusteringControl)
Final INSERT and Rename
--set identity_insert MyLargeTable_new on
--This is required if there is an identity column on the table
GO
begin transaction
update ReclusteringControl
set LowWater=HighWater
select top 0 * from MyLargeTable with (tablockx)
update ReclusteringControl set HighWater=(select max(rid) from MyLargeTable)
insert into MyLargeTable_new --columns need to be explicitly declared if identity cols are inserted
select * from MyLargeTable where rid<=(select HighWater from ReclusteringControl) and rid>(select LowWater from ReclusteringControl);
exec sp_rename MyLargeTable,XXXX_MyLargeTable;--should delete this table
exec sp_rename MyLargeTable_new,MyLargeTable;
exec sp_rename PK_MyLargeTable, PK_MyLargeTable_XXXX
exec sp_rename PK_MyLargeTable_new, PK_MyLargeTable
commit transaction
--set identity_insert MyLargeTable off
No comments:
Post a Comment