Thursday, December 15, 2011

Tales of a Lazy DBA–MySQL, SSIS, and “0” Dates

We’ve recently been tasked with converting a lot of MySQL Data into our system from a multi-tenant MySQL Database. We don’t have a fixed schedule to actually execute the imports because they’ll be on a “per customer” basis.  Well, that sounded like a great task for SSIS.  I set up the ODBC driver, connected, set the initial “sql_mode” options for our connections to (‘mssql, allow_invalid_dates’) and started to work.
First problem we ran into with an ADO.NET connection to MySQL and writing a SELECT * from schema.table was when we hit a MySQL “Date” column containing a value of ‘0000-00-00’.  SSIS threw an error, not sure what to do.  Thanks to some others who have solved this problem, I realized that within the MySQL Select statement, we could do something like:
CASE date_created WHEN '0000-00-00' THEN NULL else date_created END as date_created
to pass those dates over as NULL. That solves the implicit conversion to datetime (SQL 2005) and avoids the invalid dates.  I ran something similar for a “Time” column to:
CAST(Time_Created as char(8) ) as Time_Created
So that solved one particular table export with about 20 or so CAST and CASE statements.  Needless to say, I wasn’t looking forward to doing this for another 500 tables with a total of almost 6000 columns.
I finally set up a really basic query to generate most of the SELECT statements we would need to pull our MySQL data across without too much pain. Admittedly, a small step and I’d still need to copy/paste when I set up each new ADO.NET source, but it worked reasonably well. I’m adding the code snippet for MySQL here in case anyone else has a similar problem.
CONCAT(CASE WHEN ordinal_position = 1 THEN 'SELECT ' ELSE ', ' END,
 CASE Data_Type WHEN 'date'
 THEN CONCAT('CASE ',column_name,' WHEN ''0000-00-00'' THEN NULL else '
    ,column_name,' END as ',column_name)
 WHEN 'time' THEN CONCAT('CAST(',column_name,' AS CHAR(8) ) as ',column_name)
 ELSE column_name END, CASE WHEN Ordinal_Position <> MaxOrd THEN ''
    FROM MySchema.', c.table_name) END
    ) as Select_Column_Name
from information_schema.columns as c
JOIN (select table_name, MAX(ordinal_position) as MaxOrd
        from information_schema.columns
        WHERE Table_Schema = 'MySchema'
        GROUP BY Table_Name) as t1
    ON c.Table_Name = t1.Table_Name
where table_schema = 'MySchema'
order by c.table_name, ordinal_position LIMIT 0, 50000;

I’d love to hear other ideas if anyone has encountered this before and come up with a more elegant solution for translating “0” Date or Time data from MySQL into SQL Server.