Microsoft SQL Server Webcast Series


Highly recommended top shelf free to attend presentation material:
Microsoft SQL Server webcast series

author: Compudicted | Posted On Tuesday, February 26, 2013 10:14 PM | Comments (0)

Virtual Launch Event for the new Office 365 for business!


Virtual Launch Event for the new Office 365
for business!

Date Wednesday, February 27th 8 am PDT and 5 pm PDT

Why Attend?

Learn how the new Office 365 can help people do
their best work in a world of devices and services

Hear customers talk about how Office 365
is transforming the way they deliver productivity
tools across their organization

See how Office 365 delivers new experiences
combining the power of social with collaboration,
email and unified communications

Join in a live Q&A with Microsoft executives
and product experts

Register Here


author: Compudicted | Posted On Thursday, February 21, 2013 11:12 PM | Comments (0)

My New Productivity Enhancer – SQL Complete by Devart


As the new year is quickly approaching I am feeling increasingly worried about what goals I should be setting for 2013 as a database professional. I tried to get a glimpse back into 2012 analyzing trends in the recent software released. And suddenly one thing stood up – lots of productivity enhancements, utilities and tools hit the market! Of course I will not be able to list them all up here, but the most notable are the frequent SSDT releases with tons of unprecedented functionality never heard of before, hundreds of extensions on the MSDN Visual Studio Gallery, CodePex and other popular sites.

So, as you have probably already guessed by now, this is one of the articles about productivity enhancing tools, well it may sound like drugs or steroids, but hey, in 2013 IT market we will ought to deliver even more code in less time and of high quality at the same time.

So how to stay on top?

The answer is very simple: automate, automate, and… automate what you do very often and/or for too long.

Sure as any other database developer or DBA I spend most of my day churning SQL code by hand. Well, this article is not about replacing me with a machine though. So not to worry. Let’s be open minded in how we approach the task.

I initially started thinking about what I can do to automate the process using what I already have. I knew that the SSMS (2008 and up) comes with IntelliSense (TM), but that was not enough because I often type the same code chunks, and that is not a problem again as even the SSMS 2005 allowed a developer to create custom keyboard shortcuts that would allow placing any T-SQL code one wishes:

image

I could see that getting my repetitive code appear in the editor was not a big deal, by the way, I solved the issue of the total absence of any snippets taking functionality this way. Hmm, so then what else remains to automate? I stopped right there and the next morning came in to start another editing session with… reformatting (or beatifying my day old code). Well, I thought we have online formatters, but they are insecure, and a standalone formatter requires me leaving the SQL Server Management Studio editor, not too-too productive and professional. The next item that caught me by surprise was that I needed to navigate through my code quickly and it is too time consuming, you know, the memory seldom can keep track of what was created for too long either, so here was another missing item in the puzzle. 

At that point I went on to a quest of finding my perfect productivity enhancement add-on for SSMS.

After playing around with a couple of tools advertised, I have quickly came to realize that there are many offerings for various budgets and tasks, but what stood out of the crowd was the just released version of Devart’s SQL Code Complete!

The items I liked the most are as follows:

  • Fully customizable look of the suggestion pane (notice the vertical positioning bar that is movable), so neat!

image

  • The suggestions come with the variable type (seen above, too)
  • The suggestions expand further walking through every step in a call image
  • The ability to format the document so it becomes very professional;
  • Go to definition is very useful in finding the object in the Object Explorer: image
  • The Document Outline is just sassy in letting you finding what you need because it has a filer: image
  • The Snippet Manager is impressive, I only lacked the ability to double click on a snippet to insert it into the SSMS: image so I went ahead and submitted my idea to Devart (it is so nice somebody is listening!). If you think you need it too, please vote here.
  • Now the next feature that I dug was really the big deal – customization! It comes with all the possible settings you can apply to your SQL code, just click on edit profile and you get: image  The profile can be saved as a file and thus distributed among all the developers at the shop. Very nice! For example in all code you can instruct to put all the items in a select statement onto a separate line like this: image  These customizations are very thorough and can take a while to set to satisfy your taste, but a very good time investment. I only thought if it is possible to share the settings, then it makes sense to buy this tool for the entire software shop, but I did not see the so called ‘site’ license option. Thus submitted such a suggestion.

Sure this extension has more to offer and explore, as a 30 days trial you can really do so and I encouraged that, so before I forget not not to mention nothing beats SQL Complete in load speed with SSMS!

To sum up here I need to say that toward the end of this article I started to realize that SQL Complete add-on is a very mighty utility in letting you achieve high coding standards if used by the entire development team, something that is not available even in the Policy Based Management, if not coming soon, than rather complimenting it.

Did I mention this add-on exists in freeware form (but not as capable)?

Go get it now!

author: Compudicted | Posted On Tuesday, December 18, 2012 12:13 PM | Comments (0)

How to Generate a Create Table DDL Script Along With Its Related Tables


Have you ever wondered when creating table diagrams in SQL Server Management Studio (SSMS) how slickly you can add related tables to it by just right-clicking on the interesting table name?

Have you also ever needed to script those related tables including the master one? And you discovered you have dozens of related tables? Or may be no SSMS at your disposal?

That was me one day.

Well, creativity to the rescue!

I Binged and Googled around until I found more or less what I wanted, but it was all involving T-SQL, yeah, a long and convoluted CROSS APPLYs, then I saw a PowerShell solution that I quickly adopted to my needs (I am not referencing any particular author because it was a mashup):

   1:  ###########################################################################################################
   2:  # Created by: Arthur Zubarev on Oct 14, 2012                                                              #
   3:  # Synopsys: Generate file containing the root table CREATE (DDL) script along with all its related tables #
   4:  ###########################################################################################################
   5:   
   6:  [System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SqlServer.SMO') | out-null
   7:   
   8:  $RootTableName = "TableName" # The table name, no schema name needed 
   9:   
  10:  $srv = new-Object Microsoft.SqlServer.Management.Smo.Server("TargetSQLServerName")
  11:  $conContext = $srv.ConnectionContext
  12:  $conContext.LoginSecure = $True
  13:  # In case the integrated security is not used uncomment below
  14:  #$conContext.Login = "sa"
  15:  #$conContext.Password = "sapassword"
  16:  $db = New-Object Microsoft.SqlServer.Management.Smo.Database
  17:  $db = $srv.Databases.Item("TargetDatabase")
  18:   
  19:  $scrp = New-Object Microsoft.SqlServer.Management.Smo.Scripter($srv)
  20:  $scrp.Options.NoFileGroup = $True
  21:  $scrp.Options.AppendToFile = $False
  22:  $scrp.Options.ClusteredIndexes = $False
  23:  $scrp.Options.DriAll = $False
  24:  $scrp.Options.ScriptDrops = $False
  25:  $scrp.Options.IncludeHeaders = $True
  26:  $scrp.Options.ToFileOnly = $True
  27:  $scrp.Options.Indexes = $False
  28:  $scrp.Options.WithDependencies = $True
  29:  $scrp.Options.FileName = 'C:\TEMP\TargetFileName.SQL'
  30:   
  31:  $smoObjects = New-Object Microsoft.SqlServer.Management.Smo.UrnCollection
  32:  Foreach ($tb in $db.Tables)
  33:  {
  34:     Write-Host -foregroundcolor yellow "Table name being processed" $tb.Name
  35:     
  36:     If ($tb.IsSystemObject -eq $FALSE -and $tb.Name -eq $RootTableName) # feel free to customize the selection condition 
  37:     {
  38:        Write-Host -foregroundcolor magenta $tb.Name "table and its related tables added to be scripted." 
  39:        $smoObjects.Add($tb.Urn)
  40:     }
  41:  }
  42:   
  43:  # The actual act of scripting
  44:  $sc = $scrp.Script($smoObjects)
  45:   
  46:  Write-host -foregroundcolor green $RootTableName "and its related tables have been scripted to the target file."

Enjoy!

author: Compudicted | Posted On Sunday, October 14, 2012 9:14 PM | Comments (0)

Capture a Query Executed By An Application Or User Against a SQL Server Database in Less Than a Minute


At times a Database Administrator, or even a developer is required to wear a spy’s hat. This necessity oftentimes is dictated by a need to take a glimpse into a black-box application for reasons varying from a performance issue to an unauthorized access to data or resources, or as in my most recent case, a closed source custom application that was abandoned by a deserted contractor without source code.

It may not be news or unknown to most IT people that SQL Server has always provided means of back-door access to everything connecting to its database. This indispensible tool is SQL Server Profiler. This “gem” is always quietly sitting in the Start – Programs – SQL Server <product version> – Performance Tools folder (yes, it is for performance analysis mostly, but not limited to) ready to help you!

So, to the action, let’s start it up. Once ready click on the File – New Trace button, or using Ctrl-N with your keyboard. The standard connection dialog you have seen in SSMS comes up where you connect the standard way:

image

One side note here, you will be able to connect only if your account belongs to the sysadmin or alter trace fixed server role.

Upon a successful connection you must be able to see this initial dialog:

image

At this stage I will give a hint: you will have a wide variety of predefined templates:

image

But to shorten your time to results you would need to opt for using the TSQL_Grouped template.

Now you need to set it up.

In some cases, you will know the principal’s login name (account) that needs to be monitored in advance, and in some (like in mine), you will not. But it is VERY helpful to monitor just a particular account to minimize the amount of results returned.

So if you know it you can already go to the Event Section tab, then click the Column Filters button which would bring a dialog below where you key in the account being monitored without any mask (or whildcard): 

image

If you do not know the principal name then you will need to poke around and look around for things like a config file where (typically!) the connection string is fully exposed. That was the case in my situation, an application had an app.config (XML) file with the connection string in it not encrypted:

image

This made my endeavor very easy. So after I entered the account to monitor I clicked on Run button and also started my black-box application.

Voilà, in a under a minute of time I had the SQL statement captured:

image

author: Compudicted | Posted On Monday, September 24, 2012 12:05 PM | Comments (1)

Validate SSRS Report Input Parameters the Proper Way


It should not be a secret that sooner or later in the BI developer’s career one would face the necessity to perform a complex input parameter validations.

This necessity to validate data is typically triggered by the report design itself allowing great freedom to the end user to supply input data in a quite relaxed form. Alas, as what I heard from William Vaughan, “every data are dirty, unless proven otherwise”, and I totally agree!

I do not want to go to great extends here describing what the negative consequences are in failing to invalidate bad input, so let’s just not forget to provision data validation each time we design any report. And I would like to encourage you right away that the input validation is not terribly difficult or even time consuming endeavor at all, especially for those who are armed with the proper VB scripting knowledge.

However, even though I already seen a significant number of blog posts covering this topic, and after consulting for so many years, from my personal experience, I still have not seen so far even a single report I touched where the validation would be done 100% properly and efficiently.

This negative experience has made me highly motivated to write a post with a comprehensive example of how a BI developer should approach and implement the input data validation.

Another item I would like to tell up in front is that even though I will be covering the date range validation, the technique is quite generic and can be applied to any kind of input data validation.

To demonstrate, I will start off with an already created report. In fact, this was one of the reports I was tasked with to introduce the input parameters validation into.

It is called “Transaction Load – Daily” report which is part of a real usage tracking solution. The report takes in two parameters – the report start date and its end date.

The business rule imposed was that the start date - end date range should not exceed 31 days between, and the other I figured begged to be implemented is to prevent an end user (a human!) entering a start date greater than the report’s end date.

So a report starting state would be just two dates to pick:

image

Very simple!

To cover the range rule I decided to implement it so it would present a user with a red label that tells one about this mishap. One note: raising modal boxes with an exclamation mark kind of messages in SSRS reports is seldom a good idea because you even may not know where the report is going executing and thus it’s functionality not guaranteed to work.

So I went ahead and added a text box that displays exactly that. Plus, to drive its visibility, I created a report variable Last_Valid_Report_End_Date (let’s ignore its expression for a moment):

image

This variable would hold the last correct report end date, so anything entered greater than that value will be deemed as invalid.

This condition is easily checked using the following expression:

=IIF(Parameters!InputDate_To.Value < Variables!Last_Valid_Report_End_Date.Value, true, false)

End result for the cosmetics is

image

OK, we got the message and its visibility tackled, so here is the expression that drives the value of the Last_Valid_Report_End_Date report variable:

   1:  =DATEVALUE(CSTR(Year(DATEADD("d",31,Parameters!InputDate.Value))) + "-" + 
   2:  CSTR(Month(DateAdd("d",31,Parameters!InputDate.Value))) + "-" + 
   3:  CSTR(Day(DateAdd("d",31,Parameters!InputDate.Value))))

It feels like all the heavy-lifting is done at this point, but hey, what about the report data? There is nothing stopping it from executing and it will, for free, why? Not good. Let’s be wise and prudent by not asking the report engine to go and try to bring our data and thus making un-necessary trips to the database.

To do that we need to be a little crafty. To not bring the data we may opt for applying a parameter that would prevent the query from getting any data if the business rule does not permit – besides, this is the key point, in all reports, I did not see anybody going this far, but it is such a pity!

So to prevent the report from fetching any data the best you can do is to add a WHERE clause that works against that.

In my example the WHERE condition ended up being

<a necessary SELECT statement> AND (@IsRangeValid <> 0)

To make this parameter transparent to the end user and yet fed from his or her input you mark it hidden.

image

Note: the parameter data type is Boolean, but in the expression it is getting compared to is the 0 (zero), this is due to the fact the SSRS uses Visual Basic as its scripting engine where a true value is –1 and 0 is false.

Another note here is that the SSRS expressions are merely Visual Basic elements of code, therefore very portable between the expressions to report code.

On this note, we are going to cover the VB script portion. To add it, access the report’s code property page by right-clicking anywhere on the report designer canvas

image

The code is very simplistic and here it is in full for reference:

   1:  'Invalidates the duration between the start date and end date
   2:  Function ValidateDatesApart(StartDate as DateTime , EndDate as DateTime) as Boolean
   3:      If EndDate <= DATEVALUE(CSTR(Year(DATEADD("d",31,StartDate))) + "-" + 
   4:  CSTR(Month(DateAdd("d",31,StartDate))) + "-" + CSTR(Day(DateAdd("d",31,StartDate)))) Then
   5:          Return"True"
   6:      Else
   7:          Return "False"
   8:      End if
   9:  End Function
 
Now go back to the report parameter declaration and click on its Default Values property page

image

After you clicked on Specify values option the Value drop-down becomes available where you copy and past the following expression to:

=Code.ValidateRange(Parameters!InputDate.Value,Parameters!InputDate_To.Value)

Please note: despite the Itellisense (TM) underlines the ValidateRange portion of the expression in red as it would be invalid, in fact it is perfectly fine.

Do not forget to make any necessary other report elements hidden if the parameter validation failed, besides, if you wish, and it is a better design, you can now use the parameter to drive the visibility

image

At this point we have a working expression that helps us to prevent the un-necessary trips to the database

image

author: Compudicted | Posted On Tuesday, August 14, 2012 11:27 PM | Comments (8)

Learn About The Microsoft Private Cloud to Win a Trip for Two to Mexico!


Microsoft has released new and exciting products that will change the way IT Pros utilize Virtualization and Microsoft Private Cloud solutions.   Two products which are a part of these great changes are the newly released System Center 2012 and the soon to be released Windows Server 2012.  Both of these solutions were designed to make virtualization and extending to the private cloud simpler and much more efficient. 

With these new changes to Infrastructure and the IT world, it’s a great time to learn about these new solutions and keep yourself and your organization ahead of the curve in terms of where technology is headed.  In fact, Microsoft has even added an incentive to learn about their Private Cloud solutions through the Skyrocket Sweepstakes! 

Entering is easy!  All you have to do is register, and then download a free TechNet evaluation like Windows Server 2012 RC or System Center 2012 to get started.  Every applicable evaluation you download gives you an entry into the sweepstakes! And the best part is the more evaluations you download, the better your chances.  And what’s the prize you may ask? Oh, just a 7 day, 8 night trip for two to Cozumel, Mexico!

The contest ends September 6th so don’t wait!  Register now!

author: Compudicted | Posted On Thursday, August 9, 2012 4:22 PM | Comments (1)

SSIS 2012: How to Share Cache Among Multiple Packages


Prior to SSIS 2012 it was not possible to re-use or share the same cache across packages. What that meant is if you created a cache you will be able to consume it only in the very package where the cache was instantiated. Therefore, a developer could take only two routes from here – either make as much heavy ETL processing as possible inside the same package where the cache resides, or populate yet another cache object in every other package where it is needed. The latter was especially harmful leading to unnecessary heavy extra database calls and an extended development time. This limitation has been overcome in the SSIS 2012 release.

Let’s take a look at how to implement a shared cache and what are the new opportunities this improvement brings to us!

Will begin with creating a new SSIS 2012 package of course.

Fire up the SQL Server Data Tools (a Visual Studio 2010 Shell will open if you have no Visual Studio 2010 of any kind installed, or a new instance of Visual Studio 2010 whatever edition you have will pop up).

1. Building the Cache and Warming it Up

Create a new Business Intelligence – Integration Services project, then drag and drop a new Data Flow Task. Name it. At this stage you should have something like:

image

Go into DFT design tab in which add a new source connection. In my case I built a query to extract some personal data from the AdventureWorks2008R2 database using a join as follows:

   1:  -- Select a limited number of records for performances sake
   2:  SELECT TOP 1000
   3:  P.[BusinessEntityID],
   4:  PP.[PhoneNumber],
   5:  P.[FirstName],
   6:  P.[LastName],
   7:  P.[EmailPromotion]
   8:  FROM [AdventureWorks2008R2].[Person].[Person] P 
   9:  JOIN [AdventureWorks2008R2].[Person].[PersonPhone] PP
  10:  ON PP.[BusinessEntityID] = P.[BusinessEntityID]

I guess the use of such a query could be in providing data to other packages on all the individuals a given company is aware of.

At the next step we will add a cache connection manager:

image

and set to dump the data to a CAW file:

image

And let’s configure it like this:

image

The Index Position column if set to something greater than 0 simply denotes an index, let’s say a key or Primary Key if you wish.

Here is the zest – if you chose the file then any package in the project, not necessarily even a child package (run from the parent package) will have the access to the cache!

But wait, one more step is necessary before then: right-click on the Cache Connection Manager and choose the option “Convert to Project Connection” from the context menu items:

image

The next component to add to this DFT is the Cache Transform which we will configure as depicted below:

image

As you have probably already guessed, the BusinessEntityID we had in your SQL query is now the key column for lookups.

At this point, if we would want we could run this package already and the cache would be built and ready to be consumed. Right, we are almost done!

2. Consuming the cache in an external package

Sure we would be able to consume the cache inside the same package, this is the pre-SSIS 2012 trick, but our objective is to use it the SSIS 2012 way, in a package that is only part of the same project.

To do so right click on the SSIS Packages folder and choose the “New SSIS Package” option. Name the package. In my case, because I wanted to demonstrate how external data can be matched against the an existing cache and I came across the sick leave hours value in the Employee table I thought it could be a good candidate, so here it goes: a new package called collect sick leave encompassing a single DFT in which we have again a connection to the AdventureWorks2008R2 database bringing the contents of the Employee table with only a few columns chosen:

image

Remember when building your own packages the identity column has to exist, otherwise re-using the cache would not be possible.

So now drag and drop the cache transformation component and begin configuring it as follows:

image

An interesting discovery: the connection property page of the Lookup Transformation discovers and accepts the cache connection made in the package in step 1:

image

The columns are mapped like follows:

image

Two steps, and basically this is it to consuming the cache.

The output now can travel to one destination or another:

image

But what is important – there is no need to warm the cache up again anywhere in this project to be consumed by any other packages.

author: Compudicted | Posted On Saturday, March 10, 2012 10:28 PM | Comments (2)

Top 10+ SSIS 2012 Improvements


If you have not heard the latest technical news yet, the SQL Server 2012 Virtual Launch event is today March 8, 2012!

Register to not to miss this opportunity to interact with technology experts and win cool prizes!

I am thrilled and excited!

Why? Because like in the automotive industry, every new iteration of a vehicle is more fuel efficient, safer, more polished, looking more attractive and fun to drive.

Sure, the the new version of the SQL Server is ready to take me further in what I can deliver to my clients, empower me even more and make me a better professional!

Do not miss the chance knowing first hand what exactly is going to rock many corporate and smaller IT departments.

If the aforesaid is not enough to convince you to spend some time glancing at the future and ask fellow professionals on the new capabilities, then why not to even just enjoy some swags?

Yes, some free stuff you used to get when attending public events! How about a totally free ebook you can take anywhere? And namely: PowerShell Administrator’s Pocket Consultant by William Stanek! By the way, watch for even more books at the event for a great discount.

OK, now to the topic, my goal here is cover the top 10 (and more) improvements from my point of view in SQL Server Integration Services 2012 for a SSIS Developer.

1. ODBC Support

The ODBC support is becoming first class now I guess because of the future full integration with Hadoop and an increased demand to integrate more easily with various open source platforms. So I guess the days when you will be able to easily connect to a Linux machine from a SQL Server are coming. Attunity connectors also get more readily available and covering more vendors.

2. Change Data Capture for SSIS

The Change Data Capture (CDC) is not new to SQL Server, but it is a new kind of an animal to SSIS:

image

Now with CDC one can easily capture the changes in data sources and provide them for reporting, data analysis or feed into the Data Warehouse.

3. Support for Variable Number of Columns in a Flat File

This is a productivity enhancement that potentially pays for a good portion of the upgrade fee (IMHO). I just happen to see how many developers stumble upon such a reality unable to overcome this barrier resorting to various online forums or blogs. No longer!

If you see a file as depicted below:

image

No fear, it will be understood by the SSIS engine and handled without incidents:

image

Hooray! No more time wasted and scratching your head!

4. Revamped Configurations

This is another big improvement.

Did you ever wonder why you deployed a package and it took the design time parameters? Did you struggle to deploy your config files or a database along with the package?

No longer! You now can have several configurations, for Dev and Prod, no problem. If you envied your fellow C# or VB .Net developer being able to store parameters right in the Visual Studio, no more, now you can, too. As an aside, there is no more BIDS, there is the new Data Tools, but to me it is a Visual Studio 2010, I just develop special projects in it, and it is a 1st class tool! And how about this: you can even add parameters after the package has been deployed? Do you feel thrilled as me? Not yet, then how about the possibility of sharing parameters across many packages within a project?

5. Script Component – you can debug it, finally!

If your heart is not beating faster by now, then let’s recall how much you struggled to find out why a Script Component does not work as expected? A value, or worse yet, three are not right?

Remember? No? I do, I remember how I needed to build a console app till 10 PM to just solve the mystery why the values were wrong sitting along in the office biting nails because at midnight a package just had to load the latest flight data. I wish I could just debug the mysterious component with 400 lines of code. Sigh and smile, now I will:

image

Better yet, all my runtime values are captured. Did I say it is a Visual Studio?

6. SSIS Package Format Changed and the Specs are Open Source!

Bye-bye the lineage IDs, cryptic, long XML! Hello comparable, mergable packages!

imagevs. image

Easily compare packages with Diff tools now! Full specs are at: http://msdn.microsoft.com/en-us/library/gg587140.aspx

7. Built-in Reporting

Yes, there will be three canned reports provided for You, dear developer to benchmark, troubleshoot and just better support a live implementation:

image

image

image

8. Data Taps

This is totally new: have you ever been asked to fix a package with no rights to access the data source? I had such an “opportunity”, their DBA just shrugged off my requests to provide with a read only account. But now you are more in control, you can now turn on and off small data dumps to a CSV file for an ad-hock analysis. Those, most often, are instrumental in finding metadata differences and thus allowing a real quick fix to many issues. More on this topic is here: http://goo.gl/AUBP5

9. Deploying Projects from Visual Studio

Yes, like I said, Visual Studio is the centerpiece to developing and deploying a SSIS solution. Now you need to think more project oriented as a result, so there is a bit of paradigm shift, or I would say you need to think of a project as unit more than of a package now in SSIS 2012 (for those not ready for the change the old deployment model still works, so not to worry).

So what is different, actually all and more simple, you just deploy with a right-click on the project, no more fiddling around with the Deployment manifest or manual copy and paste, import, etc.

The configurations are taken care of automatically!

(picture is taken from Rafael Salas blog http://www.rafael-salas.com/2012/01/ssis-2012-project-deployment-model-and.html).

10. Manage with PowerShell

Did I mention about the PowerShell book at the beginning of the post? I did this on purpose Smile. SSIS 2012 provides with 1st class support to managing the SSIS indices as the SSIS catalog, package deployment and maintenance. You can craft and automate most tasks using an editor, just reference the needed module:

SNAGHTML75e1d8

There are also the APIs to validate a package, configure and deploy a package:

image

Oh, I have just already covered 10 improvements, wait but there are more:

  • Un-do and Re-do are now possible (I can hear the wow!);
  • New designer surface (AKA canvas) with adorners

image

  • Shared (across a project) Connection Managers (no more click and copy, pastes)!
  • Shared (across packages in project) Cache Managers
  • Do you remember the dreaded errors all over the package after some metadata changed? Now you can resolve them all up the stream with a single click!
  • Group items to reduce clutter without resorting to sequence containers:

image

  • The ability to rollback to an older (and working) version of a package:

image

I can hear the applause…

OK for now!

I hope I wet your appetite enough to go and explore the features yourself. And to stay always tuned do not forget too bookmark the aggregated SSIS Resources page: http://goo.gl/2WZxp!


PS: So all signs are, the SQL Server Integration Services 2012 is a huge incremental leap forward bringing better productivity, and thus shortened what I call “time-to-solution” cycle, improved connectivity and support.

Dare to explore!

Regards,

Arthur

author: Compudicted | Posted On Tuesday, March 6, 2012 12:26 AM | Comments (5)

Creating a Custom SSIS Data Flow Component - an Example With a Purpose


image

Last week I was asked to mask or obfuscate a large volume of data residing in a production database for our software quality assurance team to use in testing some functionality. The data resided in various tables and in several columns per table. Yet, this procedure would need to be done over and over again as the data gets depleted. Yet, the masking supposed to be hidden from the QA people and centrally controlled. I proposed to use SSIS. The next day I did a web scan to just find out that noting similar exists and even no good sample code can be found to drive the masking, even in the Microsoft’s All-In-One Code Framework library I could not find anything useful and ended up submitting a request to provide such a useful (at least IMHO) sample. So if you have spare 30 sec. please bother to vote here for my item.

  • The preamble and design considerations

To make the story short, I opted for creating my own custom Data Flow Component with my own data masking code. After it was successfully used by our QA for a week I decided it is time to share how I crated it. So here it goes:

To start off, I would say that there are no may examples on the web demonstrating how to create custom SSIS components. The best recourse (arguably perhaps) remains the MSDN. I was to the most part driven by two articles: the general MSDN How to on custom SSIS components development and an example by Benny Austin because I wanted a Data Flow Component similar to what he has created. I would like to note that looking back at what was done the SSIS Data Flow component mostly rotates around employing functions from these two areas: the design time and run-time. The design-time for a component starts from dragging and dropping it onto the DFT canvas (designer), and the run-time, as you have already probably guessed is when the component is at work. To implement your logic you would need to override each applicable function. To make your component addable to a Data Flow Transformation (later DFT), you simply start a new Class Library project (if you plan to use the component in SSIS 2008 then base it on .Net 3.5 and give it a cool default namespace name). The code provided here is in C#, but if my dear reader prefers to work in VB, feel free to convert it using this site: http://www.developerfusion.com/tools/convert/csharp-to-vb/

As an aside, my design goal was to automate the data masking as much as possible so another SSIS developer could set it up very quickly.

  • The (hard :-) coding part

In almost all implementations the ProvideComponentProperties needs to be set. For example in my case I ended up with:

   1:          public override void ProvideComponentProperties()
   2:          {
   3:              // Set component information
   4:              ComponentMetaData.Name = "Data Masker";
   5:              ComponentMetaData.Description = "A SSIS Data Flow Transformation Component To Provide Basic Data Masking";
   6:              ComponentMetaData.ContactInfo = "Arthur Zubarev";
   7:   
   8:              // Reset the component
   9:              base.RemoveAllInputsOutputsAndCustomProperties();
  10:   
  11:              // Add input objects
  12:              IDTSInput100 input = ComponentMetaData.InputCollection.New();
  13:              input.Name = "Input";
  14:              input.Description = "Contains un-masked columns.";
  15:   
  16:              // Add output objects
  17:              IDTSOutput100 output = ComponentMetaData.OutputCollection.New();
  18:              output.Name = "Output";
  19:              output.Description = "Contains masked columns. Gets set automatically.";
  20:              output.SynchronousInputID = input.ID; //Synchronous transformation
  21:   
  22:              //Add error objects
  23:              IDTSOutput100 errorOutput = ComponentMetaData.OutputCollection.New();
  24:              errorOutput.Name = "Error";
  25:              errorOutput.IsErrorOut = true;
  26:          }

Then the second most important place where much code can end up would be the DTSValidationStatus Validate procedure. This method fires each time component properties pertaining to its metadata get changed. Because I said I wanted to automate the data masking component usage I added some code to cover the output creation on the fly based on the [virtual] data columns chosen by a developer. The complete code follows:

   1:  public override DTSValidationStatus Validate()
   2:          {
   3:              // Determine whether the metdada needs refresh
   4:              IDTSInput100 input = ComponentMetaData.InputCollection[0];
   5:              IDTSVirtualInput100 vInput = input.GetVirtualInput();
   6:   
   7:              bool cancel = false;
   8:   
   9:              foreach (IDTSInputColumn100 column in input.InputColumnCollection)
  10:              {
  11:                  try
  12:                  {
  13:                      IDTSVirtualInputColumn100 vColumn = vInput.VirtualInputColumnCollection.GetVirtualInputColumnByLineageID(column.LineageID);
  14:                  }
  15:                  catch
  16:                  {
  17:                      ComponentMetaData.FireError(0, ComponentMetaData.Name, "The input column " + column.IdentificationString + " does not match a column in the upstream component.", "", 0, out cancel);
  18:   
  19:                      return DTSValidationStatus.VS_NEEDSNEWMETADATA;
  20:                  }
  21:              }
  22:   
  23:              // Validate input to be of type string/numeric only
  24:              bool pbCancel = false;
  25:              for (int x = 0; x < input.InputColumnCollection.Count; x++)
  26:              {
  27:                  if (!(input.InputColumnCollection[x].DataType == DataType.DT_STR ||
  28:                        input.InputColumnCollection[x].DataType == DataType.DT_WSTR ||
  29:                        input.InputColumnCollection[x].DataType == DataType.DT_DECIMAL ||
  30:                        input.InputColumnCollection[x].DataType == DataType.DT_NUMERIC ||                      
  31:                        input.InputColumnCollection[x].DataType == DataType.DT_TEXT))
  32:                  {
  33:                      ComponentMetaData.FireError(0, ComponentMetaData.Name,
  34:                                                  "Column " + input.InputColumnCollection[x].Name + " cannot be used for data masking."
  35:                          + Environment.NewLine + "The data type supplied was " + input.InputColumnCollection[x].DataType.ToString() + "."
  36:                          + Environment.NewLine + "The supported data types are DT_STR, DT_WSTR, DT_DECIMAL, DT_NUMERIC and DT_TEXT."
  37:                          + Environment.NewLine + "Unmark the offending column(s) to correct.", "", 0, out pbCancel);
  38:   
  39:                      return DTSValidationStatus.VS_ISBROKEN;
  40:                  }
  41:              }
  42:   
  43:              // Create corresponding output columns dynamically
  44:              IDTSOutput100 output = ComponentMetaData.OutputCollection[0];
  45:   
  46:              foreach (IDTSInputColumn100 inputcolumn in input.InputColumnCollection)
  47:              {
  48:                  bool IsExist = false;
  49:                  foreach (IDTSOutputColumn100 OutputColumn in output.OutputColumnCollection)
  50:                  {
  51:                      if (OutputColumn.Name == "Masked_" + inputcolumn.Name)
  52:                      {
  53:                          IsExist = true;
  54:                      }
  55:                  }
  56:   
  57:                  if (!IsExist)
  58:                  {
  59:                      IDTSOutputColumn100 outputcol = output.OutputColumnCollection.New();
  60:                      outputcol.Name = "Masked_" + inputcolumn.Name;
  61:                      outputcol.Description = "Masked " + inputcolumn.Name;
  62:                      outputcol.SetDataTypeProperties(inputcolumn.DataType, inputcolumn.Length, inputcolumn.Precision, inputcolumn.Scale, inputcolumn.CodePage);
  63:                  }
  64:              }
  65:   
  66:              //Remove redundant output columns that don't match the input columns
  67:              if (output.OutputColumnCollection.Count > input.InputColumnCollection.Count)
  68:              {
  69:                  foreach (IDTSOutputColumn100 OutputColumn in output.OutputColumnCollection)
  70:                  {
  71:                      Boolean IsRedundant = true;
  72:                      foreach (IDTSInputColumn100 InputCoulmn in input.InputColumnCollection)
  73:                      {
  74:                          IsRedundant = OutputColumn.Name.Contains("Masked_" + InputCoulmn.Name) ? false : true;
  75:                          if (!IsRedundant)
  76:                              break;
  77:                      }
  78:   
  79:                      if (IsRedundant)
  80:                      {
  81:                          output.OutputColumnCollection.RemoveObjectByID(OutputColumn.ID);
  82:                      }
  83:                  }
  84:              }
  85:   
  86:              return DTSValidationStatus.VS_ISVALID;
  87:          }

Do not forget to clean out any invalid input (this code is provided in the complete source code and omitted in the article for brevity).

Another piece that I chose to omit is the prevention from adding user output to support the complete automation.

Congrads! If all is good by now, we are at the run-time methods part! Almost done, bear with me… well almost, if you want, even at this stage you can already use the component, yes, indeed, but it will not mask the data, you can however test its design time functionality, so if you are willing to do so, the skip to the Testing the component in a SSIS Project part below.

  • Run-time Methods Implementation

To implement the component’s data processing it is simply enough to override its PreExecute and ProcessInput methods as depicted below:

   1:  public override void PreExecute()
   2:          {
   3:              IDTSInput100 input = ComponentMetaData.InputCollection[0];
   4:              inputBufferColumnIndex = new int[input.InputColumnCollection.Count];
   5:   
   6:              for (int x = 0; x < input.InputColumnCollection.Count; x++)
   7:              {
   8:                  IDTSInputColumn100 column = input.InputColumnCollection[x];
   9:                  inputBufferColumnIndex[x] = BufferManager.FindColumnByLineageID(input.Buffer, column.LineageID);
  10:              }
  11:   
  12:              IDTSOutput100 output = ComponentMetaData.OutputCollection[0];
  13:              outputBufferColumnIndex = new int[output.OutputColumnCollection.Count];
  14:   
  15:              for (int x = 0; x < output.OutputColumnCollection.Count; x++)
  16:              {
  17:                  IDTSOutputColumn100 outcol = output.OutputColumnCollection[x];
  18:                  
  19:                  // A synchronous output does not appear in output buffer, but in input buffer
  20:                  outputBufferColumnIndex[x] = BufferManager.FindColumnByLineageID(input.Buffer, outcol.LineageID);
  21:              }
  22:          }
 
   1:  // The actual data masking 
   2:          public override void ProcessInput(int inputID, PipelineBuffer buffer)
   3:          {
   4:              if (!buffer.EndOfRowset)
   5:              {
   6:                  while (buffer.NextRow())
   7:                  {
   8:                      for (int x = 0; x < inputBufferColumnIndex.Length; x++)
   9:                      {
  10:                          DataType BufferColDataType;
  11:   
  12:                          BufferColDataType = buffer.GetColumnInfo(inputBufferColumnIndex[x]).DataType;
  13:   
  14:                          if (!buffer.IsNull(x))
  15:                          {
  16:                              buffer.SetString(outputBufferColumnIndex[x], MaskData(buffer.GetString(inputBufferColumnIndex[x]))); 
  17:                          }
  18:                      }
  19:                  }
  20:              }
  21:          }

The data masking implementation is in the MaskData routine that can be changed to suite a different logic (data scrambling is implemented) or improved and/or enhanced:

   1:          // Provides a basic data masking with scrambling column content
   2:          public string MaskData(string InputData)
   3:          {
   4:              string MaskedData = InputData;
   5:   
   6:              if (MaskedData.Length > 0)
   7:              {
   8:                  // The technique used to mask the data is to replace numbers with random numbers and letters with letters
   9:                  char[] chars = new char[InputData.Length];
  10:   
  11:                  Random rand = new Random(DateTime.Now.Millisecond);                
  12:   
  13:                  int index = 0;
  14:   
  15:                  while (InputData.Length > 0)
  16:                  {
  17:                      // Get a random number between 0 and the length of the word.
  18:                      int next = rand.Next(0, InputData.Length - 1);
  19:   
  20:                      // Take the character from the random position and add to our char array.
  21:                      chars[index] = InputData[next];
  22:   
  23:                      // Remove the character from the word.
  24:                      InputData = InputData.Substring(0, next) + InputData.Substring(next + 1);
  25:   
  26:                      ++index;
  27:                  }
  28:   
  29:                  MaskedData = new String(chars);
  30:              }
  31:   
  32:              // Scrambled or empty
  33:              return MaskedData;
  34:          }

Guess what? Whoa, you are done!

Well… almost, but the fun continues, the next step is to try the component out!

  • Testing the component in a SSIS Project

To test the project up we ought to sign the DLL because it has to go to GAC. To do so just create a new SNK file and opt for password protection. If you do not know how then right-click on your project (not solution) and choose Properties, navigate to the Signing property page and enable the Sign the assembly option, the follow prompts.

End result would be something like here:

 

image

Build your project at this stage and copy the resulted DLL (typically found in Debug/Bin folder) to the <DRIVE LETTER>:\Program Files\Microsoft SQL Server\100\DTS\PipelineComponents\ folder where you are likely to see other DLLs that were created by Microsoft. As the last step you have to add this DLL to GAC, too. You may want to drag and drop it to c:\WINDOWS\assembly. Just a quick note: in real life development cycle you will end up registering and unregistering (right click in GAC folder to unreg) DLLs often, so it makes sense to create a post-build step to execute that all for you, but then you will use GACUtil.exe in it as described here: http://bit.ly/z05Ab3

We are almost there. The next step is to bring a new instance up of BIDS or Visual Studio 2008 and create a new SSIS project. Now add this new component as follows:

Right click anywhere inside the Toolbox Data Transformation items area and select Choose Items in the context menu, go to the SSIS Data Flow Items and select the Data Masker component.

You should be able to see it like this:

image

The next step is to actually employ our component.

Drag and drop it to a DFT between a data source and destination, then configure it as in this figure:

image 

image

I used the AdventureWorks database from http://msftdbprodsamples.codeplex.com/ table HumanResources.Employee to mask the NationalIDNumber and LoginID columns. The output columns named “Masked_” were created at design time by the component itself. The end result is seen here:

image

The entire project was contributed to the CodePlex site and can be downloaded from there entirely or just the binaries:

image

author: Compudicted | Posted On Thursday, March 1, 2012 11:44 PM | Comments (0)