TL;DR: Go here for a code sample.

This is long overdue post – I’m too busy these days with contract work and other commitments.

By the nature of my work, I have to do data migrations on a regular basis. Some of the migrations from one database into another, but most of the migrations are from Excel spreadsheets into a database.
A database to database migration is somewhat nerve racking – usually the structure is broken, data is dirty, fields re-purposed and all other sort of terrible things you read about on The Daily Wtf. The best so far I’ve seen are along the lines of person middle names stored in field called CompanyName2; employee information stored in table called tbl_Jobs and info about actual jobs in table tbl_Job – see the difference here? That is all to avoid the confusion!

Database-to-database migration are unique in their own kind. But I found a lot of similar things in a process of export from Excel into SQL Server. So I’ll write down these things here – for my own future reference and you, my dear reader, might find this useful as well.

Make data conversion repeatable

Depending on your situation, you might re-do the data migration once again with a new version of a spreadsheet: i.e. first time I was given an outdated version of a spreadsheet; or first import was into a UAT environment. Then a month after do the same migration into production, but with latest data.

Anyway, my main rule is to make as little modifications to the spreadsheet as possible – you can’t (easily) script these changes. And next time you do this, you’ll have to repeat them again. And you don’t want to repeat your tasks.

So I rarely do data clean-up in Excel. But if I find it easier to do in a spreadsheet, I write a VB macro and save it in a source control.

Save your migration script into version control

Chances that if you going to re-do the migration, you’ll get a different data set and a different set of issues related. So you’ll need to modify your migration script. And any piece of code modified over time should be in source control. Also if you work in a team, your team mates will be able to pull down your script.

Basic Migration Script Parts

Always set date format

SET DATEFORMAT dmy;

By default SQL Server works in US date format. Which is great if you are in US. I’m not. So I always need to specify what format I use for dates to avoid server confusion. And don’t get me started on US date format!!!

Always wrap your script in a transaction

begin transaction tran1

    -- do your migration script here

rollback transaction tran1

I always work in SQL Server Management Studio and you can select a SQL command and hit F5 to execute only the selected code. And I use this feature all the time – write a query in a file, execute only that query; adjust query; save the whole file, move next. Sometimes I forget to select a specific query and hit F5 – executing the entire script in my file by mistake. And usually my scripts are done quickly and are not meant to be executed twice or the entire file to be executed in one go. So I’ll have to do a clean-up and start the migration again in these cases. So wrapping the entire script in a transaction helps to prevent disasters, as long as you rollback the transaction.

Be careful with this one though: I’ve seen people accidentally starting transaction, but never committing it or rolling back, then starting another transaction… So if you think you’ve done a change in data, but can’t see this change in other select queries – most likely you have open transaction going (or a few nested ones). Just execute rollback transaction tran1 a few times until you see an error message.

Use #temp tables to import data from excel.

Temp tables are almost like real tables, only temporary. They are good for migration type of work.

I always create temp tables with fields as nvarchar(max). It is usually not worth to try to match the data type to what you have in Excel – most likely in the column where you always see a date, in one of the rows you’ll have something like ‘N/A’ that will throw an exception when attempted to be converted into datetime.

create table #people (
    Firstname nvarchar(max) collate SQL_Latin1_General_CP1_CI_AS,
    Surname  nvarchar(max) collate SQL_Latin1_General_CP1_CI_AS,
    Dob  nvarchar(max) collate SQL_Latin1_General_CP1_CI_AS,
    DateOfJoin  nvarchar(max) collate SQL_Latin1_General_CP1_CI_AS,
)

And include your default collation when you create the table. If collations in DB where you migrate data into does not match your temp-table collation, you won’t be able to do text matching or comparison between the two. On attempt to do a join matching on text or text-comparison with other tables, you might see this error:

Cannot resolve the collation conflict between "Latin1_General_CI_AS" and "SQL_Latin1_General_CP1_CI_AS" in the equal to operation.

If you get this, drop your temp table and re-create it with the required collation specified.

Save Excel as a tab-delimited-values file

For some reason CSV file format is the favourite in a community. But as soon as you have commas in the actual values, your process is going to be all screwed. Instead save your spreadseets as Tab-delimited: File => Save As => “Save as type” dropdown to Text (Tab delimited). This way you’ll save yourself headaches if there are commas in your data.

Migrating tab-delimited file into SQL Server takes the same effort as working with CSV:

BULK INSERT  #people
FROM 'd:\MyFiles\Path\Personnel_Details.txt'
WITH
( FIELDTERMINATOR = '\t', ROWTERMINATOR = '\n' )
GO

Convert columns into CSV list

Use your regular expressions to convert from columns into comma-separated list. A lot of times I find useful to check data integrity with simple queries like

select * from companies where name not in ('CompanyA', 'CompanyB', 'CompanyC')

Usually these 'CompanyA', 'CompanyB', 'CompanyC' are stored in XLS in columns. So I copy-paste columns from XLS into Notepad++ and apply Regular Expression replace of (.*)\r\n into '\1',. This will convert these:

CompanyA
CompanyB
CompanyC
CompanyD

Into these: 'CompanyA', 'CompanyB', 'CompanyC', 'CompanyD',

Only you’ll need to remove final comma and paste this into SQL Script.

Create insert SQL statements from columns

For simple XLS tables like lookup tables (Countries, Types, Product Categories) sometimes it is easier to generate a bunch of insert into statements in Notepad++. Copy-paste your columns from Excel into Notepad++

Great Ltd
BT
Tesco

and use regex insert into Companies \(Name\) values \('\1'\) to convert your columns into

insert into Companies (Name) values ('Great Ltd')
insert into Companies (Name) values ('BT')
insert into Companies (Name) values ('Tesco')

Of course you’ll have to adjust your regular expression to match your needs, but these are usually simple and easy to do (because everyone has only one problem here!).

Beware that this is a manual non-repeatable step that you’ll have to re-do when (and if) re-doing the migration. Chances that this type of data getting changed is usually slim, but possible. So you’ll need to decide if this is worth doing. It is advisable to save your regexp in the same migration file as your other SQL – for your future reference.

Convert your stringy-dates to datetime

I use convert sql function to change date from nvarchar to datetime:

select 
    convert(datetime, Dob, 103) as DOB,
    convert(datetime, JoinDate, 103) as JoinDate
from #people 

This conversion works with UK `dd/mm/yyyy’ format. Go here for more date formats available in SQL Server.

Find non-dates

Excel being flexible is great and that’s why people love it. And in a column of dates you’ll always find somebody putting “last year, around March” which can never be converted to a datetime. And if you have enough rows in a table it’ll be difficult to spot these by scrolling and eye-balling.
For these cases IsDate is very useful. Here is the documentation – there are examples, go check it out.

So to spot a bad date entry you’d go

SELECT * FROM #people 
WHERE ISDATE(JoinDate) = 0 and JoinDate is not null

This would give you a list of all records with incorrect entry for JoinDate.

Find non-numbers

The same thing goes for numeric values. Use IsNumeric (documentation) to check if there are fields with values that are not correct numbers:

SELECT *
FROM #people 
WHERE ISNUMERIC(SomeNumber)<> 1

Move date from temp tables

Once you are done with clean up of temp tables, you’ll need to migrate the data into real tables. I’m sure you know how to do this, but for a reference I’ll add this here:

insert into people (
    FirstName, 
    Surname, 
    DateOfBirth, 
    DateOfJoin)
select 
    Firstname, 
    Surname, 
    convert(datetime, Dob, 103) as DateOfBirth, 
    convert(datetime, DateOfJoin, 103) as DateOfJoin,
from #people

If you notice I have put aliases for fields in select part of the statement. These don’t do anything for the insert statement. But when you have more than 10 fields to insert and not all the field names match (and they rarely do!), you’ll be grateful to yourself in a month – it’ll be easier to know what field is inserted into where.

Use IsNull

In many cases your database structure is much more rigid than excel given to you. Your application might expect date of birth always present for an employee, but excel does not always have that. For these cases you usually go back to the client and say “yo, dude! I need all the data”, and your client will be like “wha? Bro, it’ll take time. Can you deal with this without a DOB, bro? I know you can!”. And you have to deal with this crap. So you go ahead and use isnull(value, substitution) – very useful function that checks if value is null and replaces it with a substitution. I believe this is widely known and a lot of you already use it.

select 
    convert(datetime, isnull(Dob, '01/01/1900'), 103) as DateOfBirth
from #people

Use NullIf

In the case opposite from the above, sometimes I get only a whitespace in cells, where it actually should be nothing (nulls). And you really don’t want to copy noise into your clean new database. So use nullif(value, comparison) – function that returns null if value matches comparison

select 
    nullif(LeaveReason, '') as LeaveReason
from #people

Conditional Insert

If Statements are not supported in select statements in SQL Server. So you need to use Switch:

select
    case when Sex = 'Male' then 0 when sex='Female' then 1 else null end as Gender
from #people

Sample

I’ve created a sample Excel file and migrated it into a sample database with all the above tricks are applied. Go to Github for the repository.

  • Mike Poole

    A different approach I’ve taken with Excel to SQL migrations is developing a custom .NET utility. I’ve used https://github.com/paulyoder/LinqToExcel to get data out of Excel and LINQ-to-SQL to write to the destination database.

    • Actually, after writing this article, I have found a big pile of problems with my approach and just last week I have been working on in internal generic library that uses EF metadata to lookup tables, create parent records, etc. LinqToExcel was one of the candidates to be used, but I did not use it because it required Excel installed on the server – it is OK for local development, but we needed that on Azure. So I used http://epplus.codeplex.com/ and quite happy with this decision. Not as neat for querying as LinqToExcel, but did the job wonderfully.

      • Mike Poole

        epplus is what I generally use for generating Excel reports. I’ve not used it yet for reading from excel files but sounds good and I’ll check it out.

        • As I said, it is not as good for reading as LinqToExcel, but awesome for writing reports!

  • John Venec

    Useful article . For my two cents , if your company are wanting to merge
    PDF files , my family found a tool here https://goo.gl/XbZs56

  • John Venec

    Wow… what a great post! I also can be helpful here :) I mostly use
    AltoMerge to merge PDF files. You can easily merge your documents here
    https://goo.gl/53ug0d

  • Aide Redde

    Wow… what a great post! I also can be helpful here :) I mostly use AltoMerge to merge PDF files. You can easily merge your documents here https://goo.gl/RKW8pG

  • Aide Redde

    Excellent commentary , Speaking of which , if you is interested in
    merging of are interested in merging of , my company encountered a tool
    here https://goo.gl/fpyDlq