Importing Data to SQL Server

I was recently asked to help a client migrate some data from IBM DB2 to SQL Server.

It was about 900 tables and a couple of terabytes of data. Before I came onsite, the client had extracted the data from DB2, a process which alone took a couple of days.

Was I was presented with was the following: one file for each table containing INSERT INTO statements for each row to be inserted into SQL. The largest file was about 2 GB, the smallest one a few K. Due to time constraints it was not possible to re-extract the data into a format there e.g. BULK INSERT or SSMI could be used.

The client had tried to import one of the larger ones – around 1.3 GB with about 3.2 million lines – using the command utility OSQL. After 14 hours we stopped the process.

My first thought was to insert explicit transactions into the files getting a structure of something like this:

BEGIN TRANSACTION
INSERT INTO …

INSERT INTO …
COMMIT TRANSACTION

I wrote a small utility which would transform the original files into the above structure given the overall size of the file and maximum allowed batch size, e.g. lines in each transaction.

This did not really help at all. The aforementioned file still took ages.

As we were running out of time, a request was made to upgrade the SQL server from 2 to 4 cores and to 16 GB Ram. This of course did not solve our import problem, but at least we got some more juice to play with.

I next made two changes, which turned out to solve the problem:

  1. Used SQLCMD instead of OSQL
  2. Split input files up into smaller chunks

SQLCMD was introduces with SQL Server 2005. It is using OLE DB and not ODBC for connectivity and it is 64 bit. Both of these will of course speed up performance.

To split up the files into smaller chunks I found a small utility called TextFileSplitter. The utility is free to use and comes with both a UI and command line interface.

Writing a small CMD-files automated the process of splitting up the largest files into chunks of 100K lines each.

SET INPUT=C:\Input
SET OUTPUT=C:\Output
SET LINES=100000

FOR %%F IN (%INPUT%\*.*) DO TextFileSplitterConsole -i=%%F -o=%OUTPUT% -splitstrategy:ls:%LINES%

 

Once we had the smaller files I started three other “CMD-file processes” to execute all the INSERT INTO statements using SQLCMD.

All in all it took a couple of hours for the splitting and inserting of data.

End result: happy client.

About strobaek

.NET developer/architect. Runner, espresso drinker and lover of gourmet food.
This entry was posted in Tips and tagged . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *