Thursday, October 28, 2010

SSIS, OLEDB Data Sources, and Table Variables

I ran across an interesting problem when trying to tune some SSIS data source performance issues the other day. I had noticed that we were doing a pretty large select out of a table filtered on a handful of date-related columns. We then pulled in most of the data from the table. Well, SQL Server saw this query and apparently figured that since we needed most of the columns, a table scan would be the most efficient way to get that data.  Since in real life, we only need a very small percentage of those rows, I decided to rewrite the select to pull in the PK values first, then pull in the data based on that subset.

My first attempt was to use a Temp Table. Sadly, using a temp table does not work in SSIS Data Sources, even if you put that inside a stored procedure. Disappointed, but not deterred, I decided to use a Table Variable. I quickly made the changes and tested them with some values I knew would returned quite a few results. Great – no worries.  I was still using a stored procedure at the time and used some of our standard code to start it off so had no issues.

Ultimately, I didn’t want to write a bunch of stored procedures that I would need to maintain outside of SSIS just for this particular load.  We had the code already inside of the OLE DB Sources, just not optimized as much as I’d like. I started pulling out the old queries, adding a table variable, populating it with the necessary PK values, the using that to filter out the main data set.  Save, deploy, run, see major speed increase.

Then, I looked at the actual insert/update counts. Yes, it ran much faster, but in the process, it didn’t pick up any results.  I was pretty confused because the step was successful, the code ran as expected through SSMS, there were no errors or warnings. I re-verified that running that same code got results in SSMS; it did.  I started doing some searching to see what might be happening and found this post by Dustin Ryan.

In short, it seems that if you have row counts being returned (default behavior), you return a number of rows affected when you insert the table variable, which results in a “Done in Proc” type message, which in turn tells SSIS that the data flow is done and moves on to the next step.  The workaround?  Add SET NOCOUNT ON; to the beginning of your OLEDB Source query (or make sure this is in your stored proc – we tend to put that in our procs by default).

Once again, thanks to Dustin Ryan for his post on this topic. There’s a good possibility I’d still be struggling with the root cause of this without that assistance.