Wednesday, November 11, 2009

Learning to work around DB Pro

I’ve been trying to get a multi-database solution working correctly with MS Visual Studio Team System for Database Professionals (aka DBPro or DataDude). Anyone who has tried to tie multiple co-dependent databases together within DB Pro has experienced some of the pain points in working with these projects. Hopefully these notes can help you to some extent.

Pain Point #1: Circular References

This is the largest challenge that we face in our environment. DB Pro doesn’t really understand that DB A can reference DB B which can then reference DB A.  You can easily add a Database Reference to another project in your solution by right-clicking the “references” sub-folder and adding a new Database Reference.  However, once you’ve done that, you can’t add one from that project back to the original.  I understand you don’t want to get caught in an endless loop, but it would be nice to have some sort of Max Level of Recursion set that would stop the program from getting caught in that.

That being said, there is a way around this. You can generate a .DBSchema file for your existing database using the instructions from this site. Note that you may need to tweak the options a little bit. I think the “ModelType” parameter has been deprecated in GDR2.  Save that DB Schema file to some central folder and repeat for each database you want to reference.  Once done, go back into your projects and add a reference to each of those files. Because these are DBSchema files and not the actual database projects, DB Pro will recognize these and you’ll be able to add a reference as needed to each database.

Remember that these schema files will not be kept up to date automatically.  As you build your DB Projects, you’ll need to update those schema files in some way.


Pain Point #2: System Objects

This isn’t a hard problem to work around and there are a lot of hints out there about this.  The easiest way to handle references to the system objects is by adding a reference to the included DBSchema files created when you installed DB Pro.  These can generally be found in %ProgramFiles%\Microsoft Visual Studio 9.0\VSTSDB\Extensions\SqlServer. There should be sub-folders for 2000, 2005, and 2008.  Simply include these references in your projects and you should be set.

This may not always help you if you reference another database referencing the system objects, though. I ran into this problem and had to find yet another workaround, which leads me to…

Pain Point #3: “Unresolved References”

When all else fails and you can’t actually get around the “Unresolved Reference” warnings due to the use of temp tables, referencing another database that references the system objects, or some other reason you have a final recourse. That is to tweak the file properties to ignore certain error messages or warnings that occur for that file.  It’s not a great solution because you could miss some valid warnings, but it can help you get around those warnings that keep popping up when you try to build or deploy.  To add an exception just find the file, select it, and edit its properties.  The last option is to “Suppress Warnings” and that will let you bypass the warnings that normally occur when DB Pro can’t figure out what you are trying to do.  This should really only be turned on for code that is definitely working in your environment and the code should be reviewed carefully when enabling this option.

 

Finally, I don’t necessarily claim that any of these are best practices. We’re learning here and trying to figure out the best way to work around the limitations and quirks in DB Pro to get it working in our environment. I’m more than happy to learn new and better ways to do things so feel free to share your tips or to correct me where I’m wrong.

Thursday, October 1, 2009

SSIS and XML Output

I was pretty surprised by this and am in pretty good company.  SSIS 2005 and 2008 do not have an XML Destination adapter. We have another "opportunity" from Microsoft to work out a way to export XML data.  I'll disagree that this is a good idea, despite those saying that it's easy to work around in a script component.  I'm not a great SSIS Script writer.

My problem - I want to take a simple SQL query that uses FOR XML AUTO to generate XML and put that into a file.  If you set this as the only part of your source query, you end up with an output of DT_Image.  Converting that to text results in a long string of numbers. While I was amused at the result, it didn't help me generate the necessary XML for our partners.

I came up with a relatively simple workaround that works well for me because I'm only dealing with one XML Data Set at a time.  I put something like the following in my OLEDB Source:

DECLARE @XMLOutput XML
SET @XMLOutput = (SELECT * FROM MyTable FOR XML AUTO)

SELECT CAST(@XMLOutput as VARCHAR(MAX)) as XMLResults

I was then able to pipe that into a Delimited Text File destination with a CRLF Delimiter and no column headers. That generated results I could pass on to our partners. While it's not the most elegant solution, it was much easier to me than trying to write and maintain a script component just to handle XML.

Thursday, September 17, 2009

Data Loads

One of the things I really like about SQL Server Integration Services is that it can load large data sets quickly, especially if you need to look up values along the way.  However, there are times that things behave in strange manners that seem hard to explain.  We were recently trying to load about 64 million rows into a warehouse-type database and kept hitting a bottleneck once we got past around 30-35 million rows.  No matter what I tried, the load would run at about 2 million rows per minute, gradually slowing down until it reached the 10,000 rows per second load speed.  I found several helpful posts, including some pointers to check the MaxRowBufferSize and similar settings.  Tweaking these helped someone who was on a memory-bound machine with a very similar problem.  I tried that and saw my initial load times improve, but still slowed down to a crawl at around the same point.

I learned more about looping through (aka "shredding") a recordset of CustomerID values to try to go through that set.  I set up a FOR Loop to loop through 5 million records at a time, I checked for processes running against the table. Nothing seemed to help.  I had posted information on Twitter, had a co-worker looking at the package with me, and even got some replies from Paul Randal.  (Thank you for the assist, Paul - sorry I wasn't checking my tweets more regularly that time.)

In the process above, I was challenged trying to set the CustomerID in some dynamic fashion on my source query and eventually resorted to treating the SQL Command as an expression and hacking in my variables into the WHERE clause.  It worked, but definitely felt awkward. I am more than open to some suggestions about how to use SSIS variables inside of a FOR loop as part of the OLEDB Source Command.  I also learned a little more about watching variable values - like needing to have a breakpoint set before they'll be available.  And my co-worker pointed me to a wonderful discussion showing that DtExecUI runs only in 32-bit mode. I'd been using that as a quick/easy way to get my parameters plugged in and to keep an eye on the general progress of the load.

About the same time that Paul suggested checking my index fragmentation levels, I remember seeing a very similar behavior pattern several years ago when a co-worker was trying to update a column in the middle of a somewhat wide clustered index on a multi-million row table. The server was thrashing all over the place, trying to re-order millions of rows as this command took place.  I checked the table and sure enough, there was a wide clustered index on the table that would cause exactly that behavior.  I don't know why I didn't check it before other than thinking that I'd just created a test table and no indexes on it.  In retrospect, that was poor planning on my part.  For the future, I'll remember to check clustered indexes before I try to load millions of (unsorted) rows into a new table.

It was a somewhat humbling experience, mostly because one of the basic things I know about loading lots of data was also something that I just completely ignored when doing all of my troubleshooting. Still, a little humility is a good thing and I've now re-learned several programming techniques in the process. The best part about this was seeing a 6 hour load process reduced down to 30 minutes, including dropping and re-creating the clustered index. That's going to make a noticeable difference in our nightly processes.

Wednesday, September 16, 2009

Getting Started

Well, I was inspired by Steve Jones (of SQL Server Central fame) to start a professional blog, if only to record some of my thoughts and experiences. I'll be writing semi-regularly on SQL Server T-SQL, SSIS, Reports, workarounds, interesting news bits, and things of interest to those who dabble in the world of SQL Server. A lot of it will probably be familiar to those who work with SQL Server quite a bit, but I'm hoping someone will find this helpful.

Thanks for stopping by and feel free to let me know how I'm doing.

-Peter Schott