Quantcast
Channel: World of Whatever
Viewing all 144 articles
Browse latest View live

SSIS Conditional Processing by day

$
0
0
I'm working on a client where they have different business rules based on the day data is processed. On Monday, they generate a test balance for their accounting process. On Wednesday, the data is hardened and Friday they compute the final balances. Physically, it was implemented like this So, what's the problem? The problem is the precedence constraints. This is the constraint for the Friday branch DATEPART("DW", (DT_DBTIMESTAMP)GETDATE()) ==6 For those that don't read SSIS Expressions, we start inside the parentheses:
  1. Get the results of GETDATE()
  2. Cast that to a DT_DBTIMESTAMP
  3. Determine the day of the week, DW, for our expression
  4. Compare the results of all of that to 6
Do you see the problem? Really, there are two but the one I'm focused on is the use of GETDATE to determine which branch of logic is executed. Today is Monday and I need to test the logic that runs on Friday. Yes, I can run these steps in isolation and given that I'm not updating the logic that fiddles with the branches, my change shouldn't have an adverse effect but by golly, that sucks from an testing perspective. It's also really hard to develop unit tests when your input data is server date. What are you going to do, allocate 5 to 7 days for testing or change the server clock. I believe the answer is No and OH HELL NAH!

This isn't just an SSIS thing, either. I've seen the above logic in TSQL as well. If you pin your logic to getdate/current_timestamp calls, then your testing is going to be painful.

How do I fix this?

This is best left as an exercise to the reader based on their specific scenario but in general, I'd favor having a step establish a reference date for processing. In SSIS, it could be as simple as a Variable that is pegged to a value when the package begins that you could then override for testing purposes through a SET call to dtexec. Or you could be populating that from a query to a table. Or a Package/Project Variable and have the caller specify the day of the week. For the TSQL domain, the same mechanics could apply - initialize a variable and perform all your tests based on that authoritative date. Provide the ability to specify the date as a parameter to the procedure.

Something, anything really, just take a moment and ask yourself - How am I going to test this?

But, what was the other problem?
Looks around... We are not alone. There's a whole other country called Not-The-United-States, or something like that - geography was never my strong suit and damned if their first day of the week isn't Sunday. It doesn't even have to be a different country, someone might have set the server to use a different starting date value for the week (assuming TSQL).

SETLANGUAGE ENGLISH;
DECLARE
-- 2016 February the 15th
@SourceDate date = '20160215'

SELECT
@@LANGUAGEAS CurrentLanguage
, @@DATEFIRSTAS CurrentDateFirst
, DATEPART(dw, @SourceDate) AS MondayDW;

SETLANGUAGE FRENCH;

SELECT
@@LANGUAGEAS CurrentLanguageFrench
, @@DATEFIRSTAS CurrentDateFirstFrench
, DATEPART(dw, @SourceDate) AS MondayDWFrench;
That's not going to affect me though, right? I mean sure, we're moving toward a georedundant trans-bipolar-echolocation-Azure-Amazon cloud computing inframastructure but surely our computers will always be set to deal with my home country's default local, right?

ETL file processing pattern

$
0
0

ETL file processing pattern

In this post, I'd like to talk about what I always feel silly mentioning because it seems so elementary yet plenty of people stumble on it. The lesson is simple, be consistent. For file processing, the consistency I crave is where my files are going to be for processing. This post will have examples in SSIS but whether you're using a custom python script or informatica or pentaho kettle, the concept remains true: have your files in a consistent location.

You can tell when I've been somewhere because I really like the following file structure for ETL processing.

root node

All ETL processing will use a common root node/directory. I call it SSISData to make it fairly obvious what is in there but call it as you will. On my dev machine, this is usually sitting right off C:. On servers though, oh hell no! The C drive is reserved for the OS. Instead, I work with the DBAs to determine where this will actually be located. Heck, they could make it a UNC path and my processing won't care because I ensure that root location is an externally configurable "thing." Whatever you're using for ETL, I'm certain it will support the concept of configuration. Make sure the base node is configurable.

A nice thing about anchoring all your file processing to a head location is that if you are at an organization that is judicious with handing out file permissions, you don't have to create a permission request for each source of data. Get your security team to sign off on the ETL process having full control to a directory structure starting here. The number of SSIS related permissions issues I answer on StackOverflow is silly.

Disaster recovery! If you need to stand up a new processing site to simulate your current production environment, it'd be really convenient to only have to pull in one directory tree and say "we're ready."

I find it useful to make the root folder a network share as well so that whatever team is responsible for supporting the ETL process can immediately open files without having to RDP into a box just to check data. Make the share read-only because SOX, SAS70, etc.

subject area

Immediately under my root node, I have a subject area. If you're just beginning your ETL, this is a place people usually skip. "We'll commingle all the files in one folder, it's easier." Or, they don't even think about it because we gotta get this done.

Please, create subject areas to segment your files. Let's look at some reasons why you are likely going to want to have some isolation. "Data.csv"Great, is that the Sales Data or Employee Data? You may not know until you open it up because you're just receiving data with no control over what the original file is called. If you had work areas for your files, you'd be able to direct the data to be delivered to the correct location with no chance for one file to clobber another.

And while we're talking about processes you can't control, let's talk about how Marketing is just going to copy the file into the folder whenever it's ready. Creating a folder structure by subject area will allow you to better control folder permissions. Remember how I said open a share, if the process is that Marketing copies the file, give them write only access to the folder they need to deliver to. If everyone copies files to a common folder, how many curious eyes will want to open Salaries_2016.xlsx? Folders make a fine mechanism for separation of sensitive data.

Input

The first folder under a given subject area is called Input. If my process needs to find data, it need only look in the input folder. That's really about it, source data goes here.

Output

This is usually the least used of my folders but I ensure every subject area has an "Output" folder. This way, I always know where I'm going to write output files to. Output files might be immediately swept off to an FTP server or some other process consumes them but this folder is where I put the data and where I can control access to external consumers of my data. I've been in places where developers made the great agreement of "you supply the data as CSV and we'll generate an Excel file when we're done" Except the people picking up the files weren't terribly computer savvy and didn't have their file extensions turned on... Yeah, so have an output folder and dump your data there.

Archive

This is my kitchen utility drawer. I throw everything in here when I'm done with it. For inbound files, the pattern looks something like
foreach source file
  1. process data
  2. move file to archive folder (possibly with date processed on the name)

Outbound processing is identical. The core process completes and generates files. A new process fires off and delivers those files.
foreach outbound file

  1. deliver file
  2. move file to archive folder (possibly with date processed on the name)

Two things to call out with this approach. The first is if you rename the files to have a date stamp on them. That's great for having a trail of when you actually processed the data. For example, we'd get in BigFile.csv on New Year's day but due to year end processes running long, we didn't actually load the file until January Second. Thus, when it gets archived, we might tack on a processed date like BigFile_2016-01-02.csv On January 5th, bad things happen and we have to restore the database to January 1st. ETL processing is no problem, you just copy those files back into the Input folder and oh, we expect the file to be named BigFile.csv exactly. Now you have to manipulate the file name before you can reprocess data. That's a pain. Or if the process accepts a file mask, you'll end up with BigFile_2016-01-02_2016-01-05.csv in the Archive folder because now we have processed the file twice.

The second thing is to use your library methods for renaming files. Don't assume everything from the first period to the end of the file name is the file extension. Don't assume the file extension is 3 characters.

Do not archive your files with any date stamp that isn't in the form of yyyy-mm-dd. Month names sort horribly. When I'm looking for files, I'm looking by year, then month, then day. Do use a delimiter between year, month and day. I know, yyyymmdd is nicer in TSQL and such but for whatever reason, I find it harder to mentally parse in a file name.

SSIS file pattern

The following Biml expresses two different implementations of the same pattern. The first uses project and package level parameters in addition to SSIS variables. The Project level parameter expresses the root node. The package level parameter defines the Subject Area. Maybe Subject Area is promoted to project parameter, based on the size and scope of your work.

Within each package, we'll then use expressions to build a FolderBase which is RootNode + SubjectArea. We'll then use expressions to define FolderArchive, FolderInput, and FolderBase. I name them in this manner because I want them to sort into the same area in my Variables window. If you really like to get clever, define a namespace for your variables beyond User like "template" or "framework."


<Bimlxmlns="http://schemas.varigence.com/biml.xsd">
<Projects>
<PackageProjectName="POC">
<Parameters>
<ParameterDataType="String"Name="FolderRoot">C:\ssisdata</Parameter>
</Parameters>
<Packages>
<PackagePackageName="FileProcessingParam"></Package>
</Packages>
</PackageProject>
</Projects>
<Packages>
<PackageName="FileProcessingParam">
<Parameters>
<ParameterDataType="String"Name="SubjectArea">Sales</Parameter>
</Parameters>
<Variables>
<VariableDataType="String"Name="FolderBase"EvaluateAsExpression="true">@[$Project::FolderRoot] + "\\" + @[$Package::SubjectArea]</Variable>
<VariableDataType="String"Name="FolderArchive"EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Archive"</Variable>
<VariableDataType="String"Name="FolderInput"EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Input"</Variable>
<VariableDataType="String"Name="FolderOutput"EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Output"</Variable>
</Variables>
</Package>
<PackageName="FileProcessingClassic">
<Variables>
<VariableDataType="String"Name="FolderRoot">C:\ssisdata</Variable>
<VariableDataType="String"Name="SubjectArea">Sales</Variable>
<VariableDataType="String"Name="FolderBase"EvaluateAsExpression="true">@[User::FolderRoot] + "\\" + @[User::SubjectArea]</Variable>
<VariableDataType="String"Name="FolderArchive"EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Archive"</Variable>
<VariableDataType="String"Name="FolderInput"EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Input"</Variable>
<VariableDataType="String"Name="FolderOutput"EvaluateAsExpression="true">@[User::FolderBase] + "\\" + "Output"</Variable>
</Variables>
</Package>
</Packages>
</Biml>
Using the above Biml generates two SSIS packages: FileProcessingParam which uses parameters as part of the 2012+ Project Deployment Model.

FileProcessingClassic is an approach that will work across all versions of SSIS from 2005 to 2016 whether you use the Project Deployment Model or the Package Deployment Model.

Take away

Create a consistent environment for all of your file based processing. It'll reduce the number of decisions junior developers need to make. It will lower the complexity of recreating your environment and all of its permissions as projects migrate through environments. It'll ensure everyone knows where the data should be in the event of an emergency or someone is on vacation. Finally, it simplifies your code because all I need to do to be successful is ensure the file lands in the right spot.

Biml Script Task - Test for Echo

$
0
0

Biml Script Task

To aid in debugging, it's helpful to have a "flight recorder" running to show you the state of variables. When I was first learning to program, the debugger I used was a lot of PRINT statements. Verify your inputs before you assume anything is a distillation of my experience debugging.

While some favor using MessageBox, I hate finding the popup window, closing it and then promptly forgetting what value was displayed. In SSIS, I favor raising events, FireInformation specifically, to emit the value of variables. This approach allows me to see the values in both the Progress/Execution Results tab as well as the Output window.

The syntax is very simple, whatever you want recorded, you pass in as the description argument. In this snippet, I'm going to build a string with three items in item in it - the variable's NameSpace, Name and Value.


bool fireAgain = false;
string message = "{0}::{1} : {2}";
var item = Dts.Variables["System::MachineName"];
Dts.Events.FireInformation(0, "SCR Echo Back", string.Format(message, item.Namespace, item.Name, item.Value), string.Empty, 0, ref fireAgain);
When that code runs, it will pull the @[System::MachineName] variable out of the SSIS Variables collection and assign it to item. We'll then fire an information message off with the bits of data we care about.

It seems silly, doesn't it to print the variable's namespace and name there when we asked for it by name from the collection. Of course it is, a nicer, more generic snippet would be


bool fireAgain = false;
string message = "{0}::{1} : {2}";
foreach (var item in Dts.Variables)
{
Dts.Events.FireInformation(0, "SCR Echo Back", string.Format(message, item.Namespace, item.Name, item.Value), string.Empty, 0, ref fireAgain);
}
There, that's much better - however many Variables we pass in to the ReadOnly or ReadWrite collection will all get echoed back to the log and output window.

Biml

The following is almost turnkey, you simply need to specify all the variables you want emitted in the log there where it says "List all the variables you are interested in tracking". Follow the pattern and all the Variables you list will get tracked.

<Bimlxmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<PackageName="so_36759813">
<Variables>
<VariableDataType="String"Name="echoBack">mixed</Variable>
</Variables>
<Tasks>
<ScriptProjectCoreName="ST_EchoBack"Name="SCR Echo Back">
<ScriptTaskProjectReference ScriptTaskProjectName="ST_EchoBack" />
</Script>
</Tasks>
</Package>
</Packages>
<ScriptProjects>
<ScriptTaskProjectProjectCoreName="ST_EchoBack"Name="ST_EchoBack"VstaMajorVersion="0">
<ReadOnlyVariables>
<!-- List all the variables you are interested in tracking -->
<VariableNamespace="System"VariableName="MachineName"DataType="String"/>
</ReadOnlyVariables>
<Files>
<FilePath="ScriptMain.cs"BuildAction="Compile">using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_EchoBack
{
[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
public void Main()
{
bool fireAgain = false;
string message = "{0}::{1} : {2}";
foreach (var item in Dts.Variables)
{
Dts.Events.FireInformation(0, "SCR Echo Back", string.Format(message, item.Namespace, item.Name, item.Value), string.Empty, 0, ref fireAgain);
}

Dts.TaskResult = (int)ScriptResults.Success;
}

enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
}
} </File>
<FilePath="Properties\AssemblyInfo.cs"BuildAction="Compile">
using System.Reflection;
using System.Runtime.CompilerServices;

[assembly: AssemblyVersion("1.0.*")]
</File>
</Files>
<AssemblyReferences>
<AssemblyReferenceAssemblyPath="System"/>
<AssemblyReferenceAssemblyPath="System.Data"/>
<AssemblyReferenceAssemblyPath="System.Windows.Forms"/>
<AssemblyReferenceAssemblyPath="System.Xml"/>
<AssemblyReferenceAssemblyPath="Microsoft.SqlServer.ManagedDTS.dll"/>
<AssemblyReferenceAssemblyPath="Microsoft.SqlServer.ScriptTask.dll"/>
</AssemblyReferences>
</ScriptTaskProject>
</ScriptProjects>
</Biml>

We could make this better by using some BimlScript to inspect the package and add all the variables in scope in our ReadWrite list but that's deeper than we're going to go in this post.

Now, if you'll excuse me, it's time to spin Test For Echo.

Biml SSIS Foreach Nodelist Container (aka Shred XML in SSIS)

$
0
0

Biml SSIS Foreach Nodelist Container (aka Shred XML in SSIS)

Every time I look at my ForEach NodeList Enumerator post, I struggle to remember how to do half of it. Plus, as I discovered today, I am inconsistent between my images. The Control Flow shows an XML structure of


<Files>
<File>Foo.txt</File>
<File>Bar.txt</File>
<File>Blee.txt</File>
</Files>
but the text inside the Foreach loop would be for parsing an XML tree in this format

<Files>
<FileName>Foo.txt</FileName>
<FileName>Bar.txt</FileName>
<FileName>Blee.txt</FileName>
</Files>

The resulting package looks like this - a Foreach Enumerator that shreds the XML. We assign the shredded value into our variable CurrentNode. We pass that as an argument into a script task that does nothing but print the value as an Information event.

Configuration remains comparable to the previous post with the exception of using Variables.


Running the package generates Output like the following


Information: 0x0 at SCR Echo Back, SCR Echo Back: User::CurrentNode : Foo.txt
Information: 0x0 at SCR Echo Back, SCR Echo Back: User::CurrentNode : Bar.txt
Information: 0x0 at SCR Echo Back, SCR Echo Back: User::CurrentNode : Blee.txt

Biml

The following Biml will create a Foreach nodelist enumerator that shreds an XML recordset. We pass the current node variable into our script task that echoes the value back


<Bimlxmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<PackageName="Task_ForEachNodeListLoop">
<Variables>
<VariableDataType="String"Name="CurrentNode"></Variable>
<VariableDataType="String"Name="SourceXML"><![CDATA[<Files><File>Foo.txt</File><File>Bar.txt</File><File>Blee.txt</File></Files>]]></Variable>
<VariableDataType="String"Name="OuterXPath"><![CDATA[/Files/File]]></Variable>
<VariableDataType="String"Name="InnerXPath"><![CDATA[.]]></Variable>
</Variables>
<Tasks>
<ForEachNodeListLoop
Name="FENLL Shred XML"
EnumerationType="ElementCollection"
InnerElementType="NodeText"
>
<VariableInputVariableName="User.SourceXML"/>
<VariableOuterXPathVariableName="User.OuterXPath"/>
<VariableInnerXPathVariableName="User.InnerXPath"/>
<VariableMappings>
<VariableMappingVariableName="User.CurrentNode"Name="0"/>
</VariableMappings>
<Tasks>
<ScriptProjectCoreName="ST_EchoBack"Name="SCR Echo Back">
<ScriptTaskProjectReference ScriptTaskProjectName="ST_EchoBack" />
</Script>
</Tasks>
</ForEachNodeListLoop>
</Tasks>
</Package>
</Packages>
<ScriptProjects>
<ScriptTaskProjectProjectCoreName="ST_EchoBack"Name="ST_EchoBack"VstaMajorVersion="0">
<ReadOnlyVariables>
<VariableNamespace="User"VariableName="CurrentNode"DataType="String"/>
</ReadOnlyVariables>
<Files>
<FilePath="ScriptMain.cs"BuildAction="Compile">using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_EchoBack
{
[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
public void Main()
{
bool fireAgain = false;
string message = "{0}::{1} : {2}";
foreach (var item in Dts.Variables)
{
Dts.Events.FireInformation(0, "SCR Echo Back", string.Format(message, item.Namespace, item.Name, item.Value), string.Empty, 0, ref fireAgain);
}

Dts.TaskResult = (int)ScriptResults.Success;
}

enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
}
} </File>
<FilePath="Properties\AssemblyInfo.cs"BuildAction="Compile">
using System.Reflection;
using System.Runtime.CompilerServices;

[assembly: AssemblyVersion("1.0.*")]
</File>
</Files>
<AssemblyReferences>
<AssemblyReferenceAssemblyPath="System"/>
<AssemblyReferenceAssemblyPath="System.Data"/>
<AssemblyReferenceAssemblyPath="System.Windows.Forms"/>
<AssemblyReferenceAssemblyPath="System.Xml"/>
<AssemblyReferenceAssemblyPath="Microsoft.SqlServer.ManagedDTS.dll"/>
<AssemblyReferenceAssemblyPath="Microsoft.SqlServer.ScriptTask.dll"/>
</AssemblyReferences>
</ScriptTaskProject>
</ScriptProjects>
</Biml>

SSIS - What is the name of the file

$
0
0

What is the name of the current file

It's common to not know the exact name of the file you're loading. You often have to apply some logic - a file mask, date calculation logic, etc to determine what file needs to be loaded. In SSIS, we often use a ForEach File Enumerator to accomplish this but an Expression task or even an Execute SQL Task can be used to retrieve/set a file name. Today, I'm going to show you two different mechanisms for identifying the current file name.

Did you know that in SSIS, the Flat File Source exposes an Advanced property called FileNameColumnName. This shows in your Properties window for the Flat File Connection Manager

There are two different click paths for setting the FileNameColumnName property. The first is to right click on the Flat File Source and select the "Show Advanced Editor" option. There, navigate to Component Properties and you can set the FileNameColumnName property there.


The second is a combination of the Properties window and the Flat File Source itself. Select the Flat File Source and go to the Properties window. There you specify the FileNameColumnName property but notice, the Flat File Source itself is put into a Warning state. To fix that, we need to double click on the component and view the Columns tab. You'll notice the name we specified in the Properties window is now set in the Columns tab and the warning goes away.


Cool story, bro

That's cool and all, but it has two downsides. The first is due to me being a dumb American but the file name that is added to the data flow is DT_WSTR/unicode/nvarchar with a length of 260. That is awesome for internationalization to default to it. Except the systems I work in never have nvarchar defined so now I will need to use a Data Conversion task to change the supplied name into a non-unicode version. That's an irritant but since I know what the pattern is, I can live with it.

The real downside with this approach is that it only works for Flat File Source. Excel, Raw File Source, and XML sources do not expose the FileNameColumnName property. Now that is a problem in my book because when I'm automating, I'd have to have one set of source patterns for flat files and a different one for non-flat files.

A better approach

So, as much as I like the built in solution, my pattern is to use a Derived Column to inject the file name into the Data Flow. I have a variable called CurrentFileName in all my packages. That contains the design-time path for my Flat File Connection Manager (or Excel). My Connection Manager will then have the ConnectionString/ExcelFilePath property assigned to be @[User::CurrentFileName]. This positions me for success because all I need to do is ensure that whatever mechanism I am using to determine my source file correctly populates that variable. In this post, a ForEach File Enumerator will handle that.

Within my Data Flow Task, I will add a Derived Column Transformation that adds my package variable into the data flow as a new column. Here, I am specifying it will be of data type DT_STR with a length of 130.

Biml

What would a post be without some Biml to illustrate the point?
I use the following as my source file.


Col1|Col2
1|2
2|3
3|4

This Biml will generate an SSIS package that has a Data Flow with a Flat File Source, a Derived Column and a Row Count. What you're interested in is seeing how we either specify the value for the FileNameColumnName in our FlatFileSource tag or enrich our data flow by adding it in our Derived Column component.


<Bimlxmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<FlatFileConnectionFilePath="C:\ssisdata\SO\Input\CurrentFileName_0.txt"FileFormat="FFF_CurrentFileName"Name="FFCM"/>
</Connections>
<FileFormats>
<FlatFileFormatName="FFF_CurrentFileName"IsUnicode="false"FlatFileType="Delimited"ColumnNamesInFirstDataRow="true">
<Columns>
<ColumnName="Col1"DataType="Int32"Delimiter="|"/>
<ColumnName="Col2"DataType="Int32"Delimiter="CRLF"/>
</Columns>
</FlatFileFormat>
</FileFormats>
<Packages>
<PackageName="CurrentFileName">
<Connections>
<ConnectionConnectionName="FFCM">
<Expressions>
<ExpressionExternalProperty="ConnectionString">@[User::CurrentFileName]</Expression>
</Expressions>
</Connection>
</Connections>
<Variables>
<VariableDataType="String"Name="CurrentFileName">C:\ssisdata\SO\Input\CurrentFileName_0.txt</Variable>
<VariableDataType="Int32"Name="RowCountSource">0</Variable>
</Variables>
<Tasks>
<ForEachFileLoop
Folder="C:\ssisdata\SO\Input"
FileSpecification="CurrentFileName*.txt"
Name="FELC Shred txt">
<VariableMappings>
<VariableMappingVariableName="User.CurrentFileName"Name="0"/>
</VariableMappings>
<Tasks>
<DataflowName="DFT Import data">
<Transformations>
<FlatFileSourceConnectionName="FFCM"Name="FFCM Pull data"FileNameColumnName="CurrentFileNameSource"/>
<DerivedColumnsName="DER Add CurrentFileName">
<Columns>
<ColumnDataType="AnsiString"Name="CurrentFileName"Length="130">@[User::CurrentFileName]</Column>
</Columns>
</DerivedColumns>
<RowCountVariableName="User.RowCountSource"Name="CNT Source Data"/>
</Transformations>
</Dataflow>
</Tasks>
</ForEachFileLoop>
</Tasks>
</Package>
</Packages>
</Biml>

If all goes well, when you run you package you should see something like the following

Building SSIS packages using the Biml object model

$
0
0

Programmatically building SSIS packages via the Biml Object Model

I thought it might be fun to try and figure out how to use the Biml Api to construct SSIS packages. This post is the first in the occasional series as I explore and find neat new things.

Getting Started

The most important precursor to doing this is you will need a licensed installation of Mist. Full stop. The assemblies we're going to use have security built into them to tell whether they are licensed and you cannot use the assemblies shipped with BidsHelper or BimlExpress as they've hardwired to the specific apps.

We're going to use two classes: AstRootNode and AstPackageNode.

Ast, what is that? Abstract Syntax Tree - it's a compiler theory thing.

AstRootNode? The root node is the <Biml /> tag. It contains all the collections in your biml declaration.

AstPackageNode? This is an instance of an SSIS package.


using Varigence.Languages.Biml;
using Varigence.Languages.Biml.Task;
using Varigence.Languages.Biml.Transformation;

...

AstRootNode arn = new AstRootNode(null);
AstPackageNode pkg = new AstPackageNode(arn);
arn.Packages.Add(pkg);

pkg.Name = "HelloWorld";

Now what?
You have two choices, you can get the Biml


Console.WriteLine(arn.GetBiml());

which results in


<Biml>
<Packages>
<PackageName="HelloWorld"/>
</Packages>
</Biml>

Or you can get the Xml


Console.WriteLine(arn.EmitAllXml());

which looks like


<Bimlxmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<PackageName="HelloWorld"/>
</Packages>
</Biml>

In this example, they're nearly identical except the Xml emission results in the namespace existing in the Biml declaration while the GetBiml call just returns Biml. Interestingly enough, if either is fed through BimlExpress, they'll both pass validation.

Biml Transformer Update a variable value

$
0
0

Biml

In preparation for Biml Hero training, I thought it would be a good idea to understand Biml Transformers. I have read that article many times but never really dug in to try and understand it, much less find a situation where it'd be something I needed. This post covers my first attempt to using one.

Use case: updating a variable value

Assume you have an SSIS variable in your Biml files that you need to update the value - the server died and you need a new server name patched in. You could do a search and replace in your biml, or apply a configuration once you emit and deploy the SSIS package but let's try the transformer.

Source biml

Let's use the following simple biml as our package. It's three sequence containers in a serial connection and a single variable ServerName with a value of SQLDEV.


<Bimlxmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<PackageName="OriginalPackage_LocalMerge"ConstraintMode="Linear">
<Variables>
<VariableDataType="String"Name="ServerName">SQLDEV</Variable>
</Variables>
<Tasks>
<ContainerName="SEQ 1"/>
<ContainerName="SEQ 2"/>
<ContainerName="SEQ 3"/>
</Tasks>
</Package>
</Packages>
</Biml>

Transformer biml

Our transformer is simple. We've specified "LocalMerge" as we only want to fix the one thing. That one thing, is an SSIS Variable named "ServerName".

What is going to happen is that we will redeclare the variable, this time we will specify a value of "SQLQA" for our Value. Additionally, I'm going to add a Description to my Variable and preserve the original value. The TargetNode object has a lot of power to it as we'll see over this series of posts on Biml Transformers.


<#@ target type="Variable" mergemode="LocalMerge" #>
<#
string variableName = "ServerName";
if (TargetNode.Name == variableName)
{
#>
<VariableDataType="String"Name="ServerName">SQLQA<Annotations><AnnotationAnnotationType="Description">Value was <#= TargetNode.Value#></Annotation></Annotations></Variable>
<#
}
#>

So what's happening?

I think it's important to understand how this stuff works, otherwise you might not get the results you expect. The Biml compiler is going to take our source biml. If I had code nuggets in there, those get expanded and once all the "normal" biml is complete, then our transformers swoop in and are executed. This allows the transformers to work with the finalized objects, metadata is set and all that, prior to rendering actual SSIS packages.

Result

It doesn't look that amazing, I admit.


<VariableDataType="String"Name="ServerName">SQLQA
<Annotations>
<AnnotationAnnotationType="Description">Value was SQLUAT</Annotation>
</Annotations>
</Variable>
But conceptually, wouldn't it be handy to be able to selectively modify bits of a package? Someone didn't name their Tasks or Components well? You could have a transformer fix that. Someone forgot to add your logging/auditing/framework code? You could have a transformer fix that too!

Start thinking in Biml! Most of my answers on StackOverflow have biml in them simply because it makes describing ETL so much easier.

Caveats

Transformers require a license for Mist. They don't work in BIDS Helper, BimlExpress or BimlOnlien.

Biml Reverse Engineer a database

$
0
0

Biml Reverse Engineer a database, a.k.a. Biml to the rescue

I'm at a new client and I needed an offline version of their operation data store, ODS, database schema. I don't know what I was expecting, but it wasn't 11,500 tables. :O That's a lot. First up to bat was Visual Studio Database Projects. I clicked Import and you really have no options to winnow the list of items your importing down via Import. Ten minutes later, the import timed out on spatial indexes. Which wouldn't be so bad except it's an all or nothing operation with import.

Fair enough, I'll use the Schema Comparison and only compare tables, that should make it less ugly. And I suppose it did but still, the operation timed out. Now what?

SSMS to the rescue. I right click on my database and select Generate Scripts and first off, I script everything but the tables. Which is amusing when you have 11.5k tables, checking and unchecking the table box causes it to spin for a bit. I generated a file for each object with the hope that if the operation goes belly up, I'll at least have some progress. Round 1, Winner! I had all my views, procedures, functions, data types (don't hate), all scripted out nice and neat. Round 2, I just selected tables. And failed.

Maybe I didn't want all the tables. They have the ODS broken out by schemas to identify the data source and I only wanted the CMS data for this first draft. I run back through the Generate Scripts wizard this time only selecting tables in the CMS schema. That significantly reduced the number of objects I needed to script but still, it failed. And my mouse finger was tired. There had to be a better way.

Of late, Biml seems to be that better way. In just a few lines, I created a connection to my database, reverse engineered the targeted schema and then wrote the SQL out to files (so I could then import them with a database project). How cool is that?

inc_Connections.biml

I first added a biml file to my SSIS project that contained an OLE DB Connection Manager to the database I was interested in.


<Bimlxmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnectionName="ODS"ConnectionString="Data Source=localhost\DEV2014;Initial Catalog=ODS;Provider=SQLNCLI11.0;Integrated Security=SSPI;"/>
</Connections>
</Biml>

ExportTables.biml

Here's the "magic". There are three neat tricks in here.


<Bimlxmlns="http://schemas.varigence.com/biml.xsd">
<#@ template tier="1" #>
<#@ import namespace="Varigence.Biml.CoreLowerer.SchemaManagement" #>
<#@ import namespace="System.IO" #>
<#
var schema = new List<string>{"CMS"};
var ODSCM = RootNode.OleDbConnections["ODS"];
var ODSDB = ODSCM.GetDatabaseSchema(schema, null, ImportOptions.None);
string fileNameTemplate = @"C:\Users\fellowsb\Documents\ODSDB\{0}_{1}.sql";
string currentFileName = string.Empty;
foreach (var table in ODSDB.TableNodes)
{
currentFileName = string.Format(fileNameTemplate, table.Schema.Name, table.Name);
System.IO.File.WriteAllText(currentFileName, table.GetDropAndCreateDdl());
}
#>
</Biml>

Tiering

The first neat thing is line 2. I have a directive that tells the biml compiler that this is a tier 1 file. I could have specified tier 3, tier 7, or tier 10, it really doesn't matter as long as this is greater than the value in inc_Connections.biml. Since I didn't specify a tier in that file, it's tier 0. I needed to use an explicit tier here because line 7 references an object in the RootNode (my connection manager) that won't be built until the connections file has been compiled. The take away for tiering: if you're objects in the Biml object tree, you might need to specify tiers to handle build dependencies.

GetDatabaseSchema

Cathrine Wilhelmsen (b|t) did an excellent job covering GetDatabaseSchema so I'll let you read her post and simply comment that this method allowed me to just reverse engineer the schema I was interested.

GetDropAndCreateDdl

The last bit of magic is GetDropAndCreateDdl. It's an extension method that allows me to take the in memory representation of the table and emit the TSQL required to create that object. I enumerate through my TableNodes collection and for each object, I call the GetDropAndCreateDdl method and dump that to a file.

Gist available


Biml Hero Training, Day 1

$
0
0

In June of 2013, I created my first SSIS package with Biml. Three years later, I have come so far except that today was my first day of Biml Hero training. Holy cow there's a lot I have yet to learn. While I can't go into the details of the training due to the non-disclosure agreement, I wanted to take a moment and share some of the public things.

StructureEqual

The base object for all things biml, AstNode, StructureEqual method. If I understood it correctly, I could use this method to determine whether my biml representation of an object, like a table, is the same as a table that I just reverse engineered. That's pretty cool and something I'll need to play with. And remembering harder, Scott once said something about how you could use Biml as a poor man's substitute for Schema Compare. I bet this is the trick to that.

designerbimlpath

As Cathrine notes, setting this attribute will give intellisense a shove in the right direction for fragments.

Extension methods

Technically, I already picked this trick up at Cathrine's excellent session a

Topological sorting

This was an in-depth Extension method but as with any good recursive algorithm it was precious few lines of code. Why I care about it is twofold: execution dependencies and as I type that, I realize lineage tracing would also fall under this, and foreign key traversal. For the former, in my world, I find I have the best success when my SSIS packages are tightly focused on a task and I use a master/parent package to handle the coordination and scheduling of sub-package execution. One could use an extension method to discover all the packages that implement an Execute Package Task and then figure out the ordering of dependent tasks. That could save me some documentation headaches.

Foreign key traversal is something that I think would be rather clever to do in Biml. When I reverse engineer a database, I can already pull in foreign key columns. What I can't do, at least easily with the current version is to figure out what the referenced table/column is. Think about it, if I know column SellerId in FactSales is foreign keyed to column Id in DimSeller (this is exposed in sys.foreign_key_columns) and SellerName is defined as unique, I could automate the building of lookups (based on name matches). If my fact's source query looks like SELECT SaleDate, Amount, SellerName FROM stagingTable, I could see if column names matched and auto inject lookups into my fact load.

Those were my public highlights. Tomorrow's another day and I can't wait to see what we do.

What table raised the error in SSIS?

$
0
0

Can I find the name of the table in SSIS that threw an error on insert?

There is a rich set of tables and views available in the SSISDB that operate as a flight recorder for SSIS packages as they execute. Markus Ehrenmüller (t) had a great question in Slack. In short, can you figure out what table is being used as a destination and I took a few minutes to slice through the tables to see if I could find it.

If it's going to be anywhere, it looks like you can find it in catalog.event_message_context

If someone is using an OLE DB Destination and uses "Table or view" or "Table or View - fast load" settings, the name of the table will be the event message_context table. If they are using a variable name, then it's going to be trickier.


SELECT
EMC.*
FROM
catalog.event_message_context AS EMC
INNER JOIN
catalog.event_message_context AS AM
ON AM.event_message_id = EMC.event_message_id
AND AM.context_source_name = EMC.context_source_name
AND AM.context_depth = EMC.context_depth
AND AM.package_path = EMC.package_path
WHERE
-- Assuming OLE DB Destination
AM.property_name = 'AccessMode'
-- Father forgive me forthis join criteria
AND EMC.property_name =
CASE
AM.property_value
-- Need to validate these values, look approximate
WHEN 0 THEN 'OpenRowset'
WHEN 3 THEN 'OpenRowset'
--WHEN 4 THEN ''
ELSE 'OpenRowsetVariable'
END
AND EMC.event_message_id IN
(
SELECT
DISTINCT
EM.event_message_id
FROM
catalog.event_messages AS EM
WHERE
-- if you know the specific operation/execution id, use it here
--EM.operation_id = @id
1=1
AND EM.message_type = 120
);

Let's break that down. We filter against the catalog.event_messages table for message_type of 120 because 120 is "Error". If you know the explicit operation_id that you are interested, remove the 1=1 and patch in that filter.

We'll use that dataset to restrict the EMC aliased table to just things that were associated to the Error. The EMC set pulls back the row that will contain the table name. We need to further figure out which of the many property_values to display. This is where it gets ugly. I think what I have is working but I have a small SSISDB to reference at the moment. We need to pick either the property_name of OpenRowset or OpenRowsetVariable. That's why we join back to the event_message_context table and use the value of the AccessMode to determine what we need to filter the EMC against.

Play with it, see if this helps you. I've added it to my SSIS SSMS Template queries so feel free to mash up those data sets as you see fit. If you find a better way to do it, I'd love to hear about it.

Biml adoption routes

$
0
0

Biml adoption routes

One of the reasons I like using Biml to generate SSIS packages is that that is no "wrong" approach to doing so. Instead, the usage and adoption of Biml should be tailored to the individual, team or organization that is using it. Despite my having used Biml for four years now, I still use it in the following ways based on the use case.

How does this work again?

As a refresher, Biml is XML that describes business intelligence artifacts. Specifically, we are going to use it to generate SSIS packages. This Biml is fed through the Biml compiler via BIDS Helper, BimlExpress, Mist, BimlStudio or a command-line compilation and SSIS packages are generated. Once generated, those SSIS packages will be indistinguishable from packages generated using BIDS/SSDT. There is no requirement to have any special server software installed to run these packages. This is entirely a developer-centric tool.

Forward only

For those just getting started with Biml, this is probably the best investment for their energy. Just this past June, I was working with a client on a very brief engagement where I was using SAP B/W as a source. Despite my best efforts, I couldn't get the CustomComponent properties "just right" for the emitted SSIS package to work.

Whatever your reason, this approach is that you will use Biml to generate as much of your SSIS package as you can and then finish coding it by hand. This is how my first few projects were implemented, by the way. For my SAP B/W packages, I stubbed in a dummy source in my data flow but the rest of my package was ready to go --- my auditing, logging, row counts, even the destination was ready. All I had to do with the designer was to open the package, replace the data flow source with the SAP B/W connector and double click the destination to have the columns route properly by name matching. Visually, I think of this approach looking like

There is a clean break between the source Biml and the package to show we've modified the generated object. If we were to regenerate, we'd have to reapply the same modifications to get the package back to the current state.

Cyclical

This approach is for those who are getting their feet under them and want to get it all "right." The arrow from SSIS back to the Biml file shows the cycle of

  1. Modify Biml
  2. Generate package
  3. Test for correctness

I found this useful as I was learning the complete syntax for all the tasks and components I wanted to represent in Biml.

Metadata driven

This approach "puts it all together." From bottom to top, we take the Biml files we developed in the Cyclical phase and make them into "patterns." That doesn't have to be a complex endeavor, it could be as simple as putting a variable in for package name.

In the center, we have .NET scripts. This doesn't mean you need to be a developer and understand the intricacies of callbacks, lambda functions and asynchronous programming. If you can comprehend how to declare a variable and how to write a foreach loop, you know enough to get this done.

At the top is a metadata repository. Fancy words that mean "Excel". Or a CSV. Or a database table. Or any other place you might have recorded information that describes what you need to build. If the patterns are a cookie cutter, the .NET scripts the hand that pushes the cutter, then the metadata is your order telling you how many of each type of cookie to create.

All three of those work together to generate "flat" Biml which then takes the above route of being fed to the compiler and emitted as SSIS packages. You won't see the flat biml getting spat out, it's all going to be in computer memory but the process remains the same.

Use it

I think regardless of how you use Biml, it's worth your time to adopt it into your organization. It hastens your development speed, it drives consistency and there's no long-term commitment involved.

Generating an SSISDB dacpac

$
0
0

Generating an SSISDB DACPAC

Creating a DACPAC is easy*. Within SSMS, you simply select the database node, Tasks, Extract Data-Tier Application. I had need to get a database reference to the SSISDB for some reporting we were building out so I clicked along my merry way.

Set the properties you're interested in, which is really just file name

The next screen simply validates what you selected previously. It'd be nice if they took their cues from the SSISDeploymentWizard folks and built out your commandline options here but no worries.

And we wait for it to build our package, wait, what? Error?


TITLE: Microsoft SQL Server Management Studio ------------------------------ Validation of the schema model for data package failed. Error SQL71564: Error validating element Signature for '[internal].[check_is_role]': The element Signature for '[internal].[check_is_role]' cannot be deployed. This element contains state that cannot be recreated in the target database. Error SQL71564: Error validating element Signature for '[internal].[deploy_project_internal]': The element Signature for '[internal].[deploy_project_internal]' cannot be deployed. This element contains state that cannot be recreated in the target database. Error SQL71564: Error validating element Signature for '[internal].[set_system_informations]': The element Signature for '[internal].[set_system_informations]' cannot be deployed. This element contains state that cannot be recreated in the target database. Error SQL71564: Error validating element Signature for '[internal].[start_execution_internal]': The element Signature for '[internal].[start_execution_internal]' cannot be deployed. This element contains state that cannot be recreated in the target database. Error SQL71564: Error validating element Signature for '[internal].[stop_operation_internal]': The element Signature for '[internal].[stop_operation_internal]' cannot be deployed. This element contains state that cannot be recreated in the target database. Error SQL71564: Error validating element Signature for '[internal].[validate_package_internal]': The element Signature for '[internal].[validate_package_internal]' cannot be deployed. This element contains state that cannot be recreated in the target database. Error SQL71564: Error validating element Signature for '[internal].[validate_project_internal]': The element Signature for '[internal].[validate_project_internal]' cannot be deployed. This element contains state that cannot be recreated in the target database. Error SQL71564: Error validating element Signature for '[internal].[check_schema_version_internal]': The element Signature for '[internal].[check_schema_version_internal]' cannot be deployed. This element contains state that cannot be recreated in the target database. Error SQL71564: Error validating element Signature for '[internal].[get_database_principals]': The element Signature for '[internal].[get_database_principals]' cannot be deployed. This element contains state that cannot be recreated in the target database. Error SQL71564: Error validating element Signature for '[internal].[get_space_used]': The element Signature for '[internal].[get_space_used]' cannot be deployed. This element contains state that cannot be recreated in the target database. Error SQL71564: Error validating element Signature for '[internal].[get_principal_id_by_sid]': The element Signature for '[internal].[get_principal_id_by_sid]' cannot be deployed. This element contains state that cannot be recreated in the target database. (Microsoft.SqlServer.Dac) ------------------------------ BUTTONS: OK ------------------------------

This was with the August release of SSMS 2016 which was beset with some defects so I thought I'd try SSMS 2014 but I got the same results. The best I could gather from searching about was that there was some validation occurring that you should be able to disable but I couldn't find any switches to throw in the GUI. But as I've already said Never trust the SSMS GUI you knew this. To the command line we go!

Generating a DACPAC from the command line

To generate a dacpac from the command line you need to find a version of SQLPackage.exe dir /s /b C:\sqlpackage.exe and I went ahead and made sure I used a version that matched my database instance.

I use named instances on my machines (DEV2012/DEV2014/DEV2016) so the following block shows my extraction of each SSISDB into a version named file.


C:\Program Files (x86)\Microsoft SQL Server\110\DAC\bin>.\sqlpackage /Action:Extract /SourceDatabaseName:"SSISDB" /SourceServerName:localhost\dev2012 /TargetFile:"C:\Src\SSISDB_2012.dacpac"
Connecting to database 'SSISDB' on server 'localhost\dev2012'.
Extracting schema
Extracting schema from database
Resolving references in schema model
Successfully extracted database and saved it to file 'C:\Src\SSISDB_2012.dacpac'.

C:\Program Files (x86)\Microsoft SQL Server\120\DAC\bin>.\sqlpackage /Action:Extract /SourceDatabaseName:"SSISDB" /SourceServerName:localhost\dev2014 /TargetFile:"C:\Src\SSISDB_2014.dacpac"
Connecting to database 'SSISDB' on server 'localhost\dev2014'.
Extracting schema
Extracting schema from database
Resolving references in schema model
Successfully extracted database and saved it to file 'C:\Src\SSISDB_2014.dacpac'.

C:\Program Files (x86)\Microsoft SQL Server\130\DAC\bin>.\sqlpackage /Action:Extract /SourceDatabaseName:"SSISDB" /SourceServerName:localhost\dev2016 /TargetFile:"C:\Src\SSISDB_2016.dacpac"
Connecting to database 'SSISDB' on server 'localhost\dev2016'.
Extracting schema
Extracting schema from database
Resolving references in schema model
Successfully extracted database and saved it to file 'C:\Src\SSISDB_2016.dacpac'.

DACPACs are handy as database references but getting the SSISDB extracted took more than a simple point and click.

*for certain definitions of "easy"

Resolving the Biml project level connection manager issue

$
0
0

Biml connection manager ids broken

Well, it's not quite that dire but it sure can seem like it. You build out N packages that use project level connection managers and all is well and good until you open them up and they're all angry with red Xs in them. Or as this person encountered, the OLE DB Source defaulted to the first connection manager and table it found.

Root cause

I say root cause without looking at source code because the guys at Varigence are way smarter than I can hope to be. I can however look at cause and effect and mitigate as I can. What I see as happening is that the packages that get generated in round 1 have their connection manager ids (ugly guids inside the XML) set and those match the Project level Connection Manager and all is good. You then generate more packages and depending on whether you overwrite the existing Connection Managers, will determine whether you break the existing packages or the new ones. Sophie's choice, eh?

The good thing, is that there's a fairly simple approach to solving this issue. For your project level connection managers, assign an explicit GUID and then reference that same guid in your packages. This approach will require tiering but it at least "works on my computer".

Environments.biml

Since we'll provide no explicit tier and there no script in there, this will be tier 0. If you need to dynamically define the values in your environment file, just be sure it is a lower tier than the subsequent files. In this file, we simply enumerate our Connections. You'll see I have two connections: one project level, one not.


<Bimlxmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<AdoNetConnection
Name="CM_ADO_DB"
ConnectionString="Data Source=localhost\dev2014;Integrated Security=SSPI;Connect Timeout=30;Database=msdb;"
Provider="SQL"/>
<OleDbConnection
Name="CM_OLE"
ConnectionString="Data Source=localhost\dev2014;Initial Catalog=tempdb;Provider=SQLNCLI11.0;Integrated Security=SSPI;"
CreateInProject="true"/>
</Connections>
</Biml>

Projects.biml

There is no attribute in the Connections collection to assign a guid. It's simply not there. If you want to associate an Id with an instance of a Connection your choices are the Project node and the Package node. Since we're dealing with project level connection managers, we best cover both bases to ensure Ids synchronize across our project. If you wish, you could have embedded this Projects node in with the Connections but then you'd have to statically set these Ids. I feel like showing off so we'll go dynamic.

To start, I define a list of static GUID values in the beginning of my file. Realistically, we have these values in a table and we didn't go with "known" values. The important thing is that we will always map a guid to a named connection manager. If you change a connection manager's definition from being project level to non, or vice versa, this will result in the IDs shifting and you'll see the same symptoms as above.

I use Biml to inspect itself, thus the need for tiering, and for all the Connection managers I find that satisfy the criteria of CreateInProject == true, I want to define them within my Connections collection inside my PackageProject node.


<Bimlxmlns="http://schemas.varigence.com/biml.xsd">
<#
// Build a repeatable list of GUIDs
List<Guid> staticGuids = new List<Guid>();
staticGuids.Add(Guid.Parse("DEADBEEF-DEAD-BEEF-DEAD-BEEFDEADBEEF"));
staticGuids.Add(Guid.Parse("DEADBEEF-DEAD-BEEF-DEAD-BEEF8BADF00D"));
staticGuids.Add(Guid.Parse("DEADBEEF-DEAD-BEEF-DEAD-D15EA5EBAD00"));
staticGuids.Add(Guid.Parse("FEEDFACE-DEAD-BEEF-FACE-FEED00000000"));
#>
<Projects>
<PackageProjectName="ConnectionManagerIssue">
<Connections>
<#
Guid g = new Guid();
// Only generate a Connection node for project level connections
foreach (var item in RootNode.Connections.Where(x => x.CreateInProject))
{
// Pop the first element so we don't repeat
g = staticGuids[0];
staticGuids.RemoveAt(0);
#>
<ConnectionConnectionName="<#=item.Name#>" Id="<#= g.ToString().ToUpper() #>"/>
<#
}
#>
</Connections>
</PackageProject>
</Projects>
</Biml>

Packages.biml

We'll use much the same trick except that now we'll inspect the Projects node to find all the connection managers. By default, we'll only have the Project level ones defined there so it's a bit easier. Build the package as normal but at the bottom, stub in the Connections collection and then populate the connection managers with their Ids. Since I'm lazy, I'm going to just call GetBiml for all the connection managers I find in the Projects node collection since they have the same attributes of Name and Id.


<#@ template tier="20" #>
<Bimlxmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<# foreach(int index in Enumerable.Range(0,5)) {#>
<PackageName="Fixed_0<#=index#>">
<Variables>
<VariableDataType="String"Name="QuerySource">SELECT 100 AS foo;</Variable>
<VariableDataType="Int32"Name="RowCountSource">0</Variable>
</Variables>
<Tasks>
<DataflowName="DFT Demo">
<Transformations>
<OleDbSourceConnectionName="CM_OLE"Name="DFT Source">
<VariableInputVariableName="User.QuerySource"></VariableInput>
</OleDbSource>
<RowCountVariableName="User.RowCountSource"Name="RC Source"/>
</Transformations>
</Dataflow>
</Tasks>
<Connections>
<#
foreach(var x in RootNode.Projects.SelectMany(x => x.AllDefinedSuccessors()).OfType<AstConnectionReferenceBaseNode>())
{
WriteLine(x.GetBiml());
}
#>
</Connections>
</Package>
<# } #>
</Packages>
</Biml>

Wrap up

I tack the Connection logic at the bottom of my packages because when Varigence gets the next release of BimlExpress out, I expect this will be resolved so I can just snip that unwanted code out.

Happy Biml'ing

Generating a workload for AdventureWorks

$
0
0

Just a quick note, if you need to generate some database activity, Jonathan Keyhayias has the The AdventureWorks2008R2 Books Online Random Workload Generator that works "fine enough" for AdventureWorks2014.

Run AdventureWorks BOL Workload.ps1

I modified line 5 to point to my server and line 21 to point to AdventureWorks2014.

AdventureWorks BOL Workload.sql

Despite the PowerShell script setting the database context, the accompanying .sql file has explicit USE statements but a quick search and replace for AdventureWorks2008R2 -> AdventureWorks2014 had me up and running.

Thank you to Jonathan for the handy script. Now if you'll excuse me, I have query activity to capture.

UNION removes duplicates

$
0
0

UNION removes duplicates

When you need to combine two sets of data together, we use the UNION operator. That comes in two flavors: UNION and UNION ALL. The default is to remove duplicates between the two sets whereas UNION ALL does no filtering.

Pop quiz! Given the following sets A and B

What's the result of SELECT * FROM A UNION SELECT * FROM B;

Piece of cake, we start with everything in A and get the values in B that aren't in A.


So we're looking at 1, 5, 9 7, 3, 3, 2, 3

Except of course that's not what is actually happening. UNION is actually going to smash both sets of data together and then take the distinct results. Or it does a distinct within each result set, smashes them together and takes one last pass to remove duplicates. I don't know or care about the actual mechanics, what I care about is the final outcome.

We actually end up with a result of 1, 5, 9, 7, 3, 2. In the fifteen years I've been writing SQL statements, I don't think I ever realized that behavior of the final result set being distinct. I thought it was purely an intra set dedupe process.

I thought wrong


Debugging Biml

$
0
0

Debugging Biml

At this point, I don't even know who to give credit for on this tip/trick as I've seen it from so many luminaries in the field. This mostly applies to BimlScript debugging within the context of BIDS Helper/BimlExpress.

Using tooling is always a trade-off between time/frustration and monetary cost. BIDS Helper/BimlExpress are free so you're prioritizing cost over all others. And that's ok, there's no judgement here. I know what it's like to be in places where you can't buy the tools you really need. One of the hard parts about debugging the expanded Biml from BimlScript is you can't see the intermediate or flat Biml. You've got your Metadata, Biml and BimlScript and a lot of imagination to think through how the code is being generated and where it might be going wrong. That's tough. Even at this point where I've been working with it for four years, I can still spend hours trying to track down just where the heck things went wrong. SPOILER ALERT It's the metadata, it's always the metadata (except when it's not). I end up with NULLs where I don't expect it or some goofball put the wrong values in a field. But how can you get to a place where you can see the result? That's what this post is about.

It's a trivial bit of code but it's important. You need to add a single Biml file to your project and whenever you want to see the expanded Biml, prior to it being translated into SSIS packages, right click on the file and you'll get all that Biml dumped to a file. This recipe calls for N steps.

WritAll.biml

Right click on your project and add a Biml file called WriteAll.biml. Or whatever makes sense to you. I like WriteAll because it will generally sort to the bottom of my list of files alphabetically and that's about as often as I hope to use it.

Tiering

The first thing we need to do is ensure that the tier of this BimlScript file is greater than any other asset in the project. We will do that through the directive of template tier="N" where N is a sufficiently large number to ensure we don't have any natural tiers greater than it.

I'll also take this as an opportunity to impart a lesson learned from writing Apple Basic many, many years ago. Do not use one as the step value for line numbers or tiers in this case. Instead, give yourself some breathing room and count by 10s because sure as you're breathing, you'll discover you need to insert something between 2 and 3 and you can't add 2.5, much less 2.25. The same lesson goes with Tiers. Tier 0 is flat biml. Tier is biml script that doesn't specify its tier. After that you're in control of your destiny.

WriteAllText

The .NET library offers a method called WriteAllText. This is the easiest method to write all the text to a file. It takes two arguments: the contents and the file name. If the file exists, it's going to overwrite it. If it doesn't exist, it will create it. Piece of pie!

Path.Combine

WriteAllText needs a path - where should we put it? I'm lazy and want to put our debugging file into a location everyone has on their computer. I can't tell you what that location will be because it's going to be different for everyone but it's guaranteed to exist. It's the %userprofile% location. On my work laptop, it's C:\Users\BillFellows. On my home computer, it's C:\users\bfellows At the governmental agency, my home directory was actually on a network somewhere so it was just H:\ All you have to do is open up windows explorer and type %userprofile% and that's where we'll write this file.

If you are ever putting paths together through string building, please stop. It's a pain to deal with escaping the path separators, \, and it can be difficult to be consistent as some will build a path with a trailing slash and others won't. Stop trying to figure out that logic and use Path.Combine

We'll combine the special path location with a file name, Debug.biml and get a perfectly valid path for our output file. If you don't want overkill, then just make a hardcoded path.

GetBiml

Every object in the Biml universe supports the GetBiml method. What's amazingly powerful about this function is that it has the ability to call the GetBiml method on all the items under it. You don't have to worry about how many packages exist and how many Tasks and Variables and Events exist under them. Just call the appropriate parent level GetBiml method and object inheritance takes care of the rest.

RootNode

The RootNode is the base of everything in Biml so by calling its GetBiml method, you'll get the Biml for all the derived objects within the project. Eureka! That's what we wanted! And since we won't call this until everything else has completed, via tiering property, we will get our flattened Biml

WriteAll.biml

Putting all that together, we get a file that looks like this


<#@ template tier="999"#>
<#
System.IO.File.WriteAllText(System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.UserProfile), "Debug.biml"), RootNode.GetBiml());
#>
If I want to see what's being built in ComplexBimlScript, I simply multiselect it and WriteAllText and I'll get a Debug.biml file. From there, I generally open Debug.biml in a separate SSIS project and Check Biml For Errors and it's much easier to zip to the error. Then it's a matter of trying where that bad code is generated back to the correct bit of Biml.

Closing thoughts

If you get some really weird error going on inside your BimlScript, this debug file will appear to be an empty Biml tag. In that case, it's probably your metadata so start breaking your solution down until it's working and then gradually add complexity back into it.

p.s.

An alternative thought on tearing your code apart until you find it works would be to use this WriteAllText approach but do it per tier. That would allow you to inspect the compilation at ever step in the process to discern where things went wrong.

What packages still use Configuration?

$
0
0

What packages still use Configurations?

I'm sitting in Tim Mitchell's excellent "Deep Dive into the SSISDB session" and someone asked how they can figure out what packages use the classic deployment model's Configuration option.

Create an SSIS package. Add a Variable to your package called FolderSource and assign it the path to your SSIS packages. Add a Script Task to the package and then add @[User::FolderSource] to the ReadOnly parameters.

Double click the script, assuming C#, and when it opens up, use the following script as your Main


publicvoid Main()
{
// Assign the SSIS Variable's value to our local variable
string sourceFolder = Dts.Variables["FolderSource"].Value.ToString();
Microsoft.SqlServer.Dts.Runtime.Application app = new Microsoft.SqlServer.Dts.Runtime.Application();
string message = "Package {0} uses configuration {1}";
bool fireAgain = false;
Package pkg = null;
foreach (string packagePath in System.IO.Directory.GetFiles(sourceFolder, "*.dtsx", System.IO.SearchOption.AllDirectories))
{
try
{
pkg = app.LoadPackage(packagePath, null);
// EnableConfigurations is a boolean specifying whether you have checked the first button
if (pkg.EnableConfigurations)
{
Dts.Events.FireInformation(0, "Configuration Finder", string.Format(message, packagePath, string.Empty), string.Empty, 0, ref fireAgain);

// This will expose all the configurations that are being used
// because you could have specified different configuration mechanisms
foreach (Configuration config in pkg.Configurations)
{
Dts.Events.FireInformation(0, "Configuration Details", string.Format(message, packagePath, config.ConfigurationType), string.Empty, 0, ref fireAgain);
}
}
}
catch (Exception ex)
{
Dts.Events.FireWarning(0, "Config finder", packagePath, string.Empty, 0);
Dts.Events.FireWarning(0, "Config finder", ex.ToString(), string.Empty, 0);

}
}

Dts.TaskResult = (int)ScriptResults.Success;
}

Save and close the package and hit F5.

How cool is that, we're using an SSIS package to inspect the rest of our packages. Now, if you store your packages in the MSDB, the above changes ever so slightly. We'd need to provide a connection string to the database and then change our first foreach loop to enumerate through all the packages in the MSDB. Perhaps that'll be a followup post.

Playing audio via Biml

$
0
0

Playing audio via Biml

How often do you need to play audio while you're compiling your Biml packages? Never? Really? Huh, just me then. Very well, chalk this blog post as one to show you that you really can do *anything* in Biml that you can do in C#.

When I first learned how I can play audio in .NET, I would hook the Windows Media Player dll and use that. The first thing I then did was create an SSIS package that had a script task which played the A-Team theme song while it ran. That was useless but a fun demo. Fast forward to using Biml and I could not for the life of me get the Windows Media Player to correctly embed in a Biml Script Task. I suspect it's something to do with the COM bindings that Biml doesn't yet support. Does this mean you shouldn't use Biml - Hell no. It just means I've wandered far into a corner case that doesn't yet have support.

Hours before going on the stage for my Summit 2016 presentation, I took another crack at finding a way to play music via .NET and discovered the System.Media.SoundPlayer class and I was ecstatic.

Tada!

You understand this code, it's not hard. I create a string variable to hold the path to my sound file. I picked a sound file in a well known location. I prefaced my string with the @ symbol to avoid having to escape the default windows path separator.


<Bimlxmlns="http://schemas.varigence.com/biml.xsd">
<#
string sourceFile = string.Empty;
sourceFile = @"C:\Windows\Media\tada.wav";
System.Media.SoundPlayer player = new System.Media.SoundPlayer(sourceFile);
player.Play();
#>
</Biml>

SSIS package that plays music

Using the above knowledge, we can also have an SSIS package with a script task to play an arbitrary media file


<Bimlxmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<PackageName="SoundPlayer">
<Variables>
<VariableName="AudioPath"DataType="String">http://www.moviewavs.com/0053148414/WAVS/Movies/Star_Wars/imperial.wav</Variable>
</Variables>
<Tasks>
<ScriptProjectCoreName="ST_PlayAudio"Name="SCR Echo Back">
<ScriptTaskProjectReference ScriptTaskProjectName="ST_PlayAudio" />
</Script>
</Tasks>
</Package>
</Packages>
<ScriptProjects>
<ScriptTaskProjectProjectCoreName="ST_PlayAudio"Name="ST_PlayAudio"VstaMajorVersion="0">
<ReadOnlyVariables>
<!-- List all the variables you are interested in tracking -->
<VariableNamespace="User"VariableName="AudioPath"DataType="String"/>
</ReadOnlyVariables>
<Files>
<FilePath="ScriptMain.cs"BuildAction="Compile">using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_PlayAudio
{
[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
public void Main()
{
string sourceFile = string.Empty;
sourceFile = Dts.Variables[0].Value.ToString();
System.Media.SoundPlayer player = new System.Media.SoundPlayer(sourceFile);
player.Play();
Dts.TaskResult = (int)ScriptResults.Success;
}

enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
}
} </File>
<FilePath="Properties\AssemblyInfo.cs"BuildAction="Compile">
using System.Reflection;
using System.Runtime.CompilerServices;

[assembly: AssemblyVersion("1.0.*")]
</File>
</Files>
<AssemblyReferences>
<AssemblyReferenceAssemblyPath="System"/>
<AssemblyReferenceAssemblyPath="System.Data"/>
<AssemblyReferenceAssemblyPath="System.Windows.Forms"/>
<AssemblyReferenceAssemblyPath="System.Xml"/>
<AssemblyReferenceAssemblyPath="Microsoft.SqlServer.ManagedDTS.dll"/>
<AssemblyReferenceAssemblyPath="Microsoft.SqlServer.ScriptTask.dll"/>
</AssemblyReferences>
</ScriptTaskProject>
</ScriptProjects>
</Biml>

Now, you could marry the two Biml snippets together so that you get audio playing while you build an SSIS package that plays audio, Dawg.

Biml Database Inspection

$
0
0

Biml Database Inspection

Importing tables via Biml

I've mentioned how using Biml to reverse engineer a very large database was the only option and there is plenty of great material in the community about how to do this but one thing I kept stumbling over was the using the import methods to build the above Biml always seemed to fail somewhere along the way. I assumed it was just me not understanding how it works. But, today someone else got bit with the same stumbling block so I wanted to talk through the basics of how the modeling works within Biml and subsequent posts will show the source of the issue and a work around.

Preface

Biml allows you to define the tables, views, and constraints in your database. Let's look at a minimal viable table definition for dbo.AWBuildVersion from AdventureWorks2014. Ready?


<Bimlxmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnectionName="Adventureworks"ConnectionString="Data Source=localhost\dev2014;Initial Catalog=AdventureWorks2014;Provider=SQLNCLI11;Integrated Security=SSPI;"/>
</Connections>
<Databases>
<DatabaseConnectionName="Adventureworks"Name="AdventureWorks2014"/>
</Databases>
<Schemas>
<SchemaName="dbo"DatabaseName="AdventureWorks2014"/>
</Schemas>
<Tables>
<TableName="AWBuildVersion"SchemaName="AdventureWorks2014.dbo">
<Columns>
<ColumnName="SystemInformationID"DataType="Byte"IdentityIncrement="1"/>
<ColumnName="Database Version"DataType="String"Length="25"/>
<ColumnName="VersionDate"DataType="DateTime"/>
<ColumnName="ModifiedDate"DataType="DateTime"/>
</Columns>
<Keys>
<PrimaryKeyName="PK_AWBuildVersion_SystemInformationID"Clustered="false">
<Columns>
<ColumnColumnName="SystemInformationID"/>
</Columns>
</PrimaryKey>
</Keys>
<Indexes>
</Indexes>
</Table>
</Tables>
</Biml>

Wow, that's a lot! Let's break it down.

Connections

Our Connections collection has a single entity in it, an OLE DB Connection named Adventureworks (remember, all of this is case sensitive so this Adventureworks is a different beast from AdventureWorks, ADVENTUREWOKRS, etc). This provides enough information to make a database connection. Of note, we have the server and catalog/database name defined in there. Depending on the type of connection used will determine the specific name used i.e. Initial Catalog & Data Source; Server & Database, etc. Look at ConnectionStrings.com if you are really wanting to see how rich (horrible) this becomes.

Databases

A Database (AstDatabaseNode) requires a Name and a ConnectionName. We certainly know the connection since we just defined it in the previous section and so here I'm naming the Database AdventureWorks2014. This just happens to align with the value specified in Initial Catalog but use whatever is natural. Do not name it after the environment though, please. There is nothing worse than talking about an entity named "DevSalesDb" which is referencing the production database but named after the location it was built from.

Schemas

A Schema (AstSchemaNode) requires a Name and a DatabaseName (see above). Since I'm after a table in the dbo schema, I just specify it as the name.

Tables

Finally, the Table (AstTableNode) which requires a Name and a SchemaName. Where have we seen this pattern? However, look at the value of the SchemaName. We have to qualify the schema with the database because we could have two Schema entities in dbo that point to different Database entities.

Once inside the Table entity, we can define our columns, keys, indices as our heart desires.

Wrap up

An amusing side note, if you're using Mist/BimlStudio to import the Schema and Table, the wizard renders all of this correctly, there only seems to be a defect in how I'm scripting the above entities.

Getting Windows share via python

$
0
0

Windows network shares with python

Backstory

On a daily basis, we receive data extracts from a mainframe. They provide a header and data file for whatever the business users want to explore. This client has lots of old data ferreted away and they need to figure out if there's value in it. Our job is to consume the header files to drop and create tables in SQL Server and then populate with actual data. The SQL is trivial -


CREATE TABLE Foo (Col1 varchar(255), ColN varchar(255));
BULK INSERT Foo FROM 'C:\sourceFile.csv' WITH (FIRSTROW=1,ROWTERMINATOR='\n',FIELDTERMINATOR='|');

Let's make this harder than it should be

Due to ... curious permissions and corporate politics, the SQL Server service account could only read files via a network share (\\Server\Share\Input\File.csv), never you no mind the fact that path was really just D:\Share\Input. A local drive but permissions were such that we couldn't allow the service account to read from the drive. Opening a network share up and letting the account read from that - no problem.

What are the shares?

That's an easy question to answer, because I knew the answer. net share. I coded up a simple parser and all was well and good until I ran it on the server which had some really, long share names and/or the Resource was long. Like this


Share name Resource Remark

-------------------------------------------------------------------------------
C$ C:\ Default share
IPC$ Remote IPC
ADMIN$ C:\WINDOWS Remote Admin
DEV2016 \\?\GLOBALROOT\Device\RsFx0410\\DEV2016
SQL Server FILESTREAM share
RidiculouslyLongShareName
C:\users\bfellows\Downloads
The command completed successfully.
Super. The output of net share is quasi fixed width and it just wraps whatever it needs to onto the next line/column.

What are the sharesv2

Windows Management Instrumentation to the rescue! WMIC.exe /output:stdout /namespace:\\root\cimv2 path Win32_Share GET Name, Path That's way better, sort of


Name Path
ADMIN$ C:\WINDOWS
C$ C:\
DEV2016 \\?\GLOBALROOT\Device\RsFx0410\\DEV2016
IPC$
RidiculouslyLongShareName C:\users\bfellows\Downloads
Originally, that command ended with GET * which resulted in a lot more information being returned than I needed. The devil though, is that the output width is dependent upon the source data. If I remove the network share for my RidiculouslyLongShareName and rerun the command, I get this output

Name Path
ADMIN$ C:\WINDOWS
C$ C:\
DEV2016 \\?\GLOBALROOT\Device\RsFx0410\\DEV2016
IPC$
Users C:\Users
It appears to be longest element +2 spaces for this data but who knows what the real encoding rule is. The good thing is, that while variable, the header rows gives me enough information to slice up the data as needed.

This needs to run anywhere

The next problem is that this process in Dev runs on D:\Share but in QA is is on the I:\datafiles\instance1 and oh by the way, there are two shares for the I drive \\qa\Instance1 (I:\datafiles\instance1) and \\qa\datafiles. (I:\datafiles) In the case where there are multiple shares, if there's one for the folder where the script is running, that's the one we want. Otherwise, it's probably the "nearest" path which I interpreted as having the longest path.

Code good

Here's my beautiful, hacky python. Wherever this script runs, it will then attempt to render the best share path to the same location.


import os
import subprocess

def _generate_share_dictionary(headerRow):
"""Accepts a variable width, white space delimited string that we attempt
to divine column delimiters from. Returns a dictionary of field names
and a tuple with start/stop slice positions"""

# This used to be a more complex problem before I realized I didn't have
# to do GET * in my source. GET Name, Path greatly simplifies
# but this code is generic so I keep it as is

header = headerRow
fields = header.split()
tempOrds = {}
ords = {}
# Populate the temporary ordinals dictionary with field name and the
# starting, zero based, ordinal for it.
# i.e. given
#Name Path
#01234567890123456789
# we would expect Name:0, Path:9
for field in fields:
tempOrds[field] = headerRow.index(field)

# Knowing our starting ordinal positions, we will build a dictionary of tuples
# that contain starting and ending positions of our fields
for iter in range(0, len(fields) -1):
ords[fields[iter]] = (tempOrds[fields[iter]], tempOrds[fields[iter+1]])

# handle the last element
ords[fields[-1]] = (tempOrds[fields[-1]], len(headerRow))

return ords

def get_network_shares():
"""Use WMIC to get the full share list. Needed because "net share" isn't parseable"""
_command = r"C:\Windows\System32\wbem\WMIC.exe /output:stdout /namespace:\\root\cimv2 path Win32_Share GET Name, Path"
#_command = r"C:\Windows\System32\wbem\WMIC.exe /output:stdout /namespace:\\root\cimv2 path Win32_Share GET *"
_results = subprocess.check_output(_command, shell=True).decode('UTF-8')

_headerRow = _results.splitlines()[0]
headerOrdinals = _generate_share_dictionary(_headerRow)

_shares = parse_network_shares_name_path(headerOrdinals, _results)
return _shares

def parse_network_shares_name_path(header, results):
"""Rip apart the results using our header dictionary"""
_shares = {}
#use the above to slice into our results
#skipping first line since it is header
for _line in results.splitlines():
if _line:
_shares[_line[header["Name"][0]: header["Name"][1]].rstrip()] = _line[header["Path"][0]: header["Path"][1]].rstrip()
return _shares


def translate_local_path_to_share(currentPath):
"""Convert the supplied path to the best match in the shares list"""
shareName = ""
defaultShare = ""
shares = get_network_shares()

# find the first share match
if currentPath in shares.values():
shareName = [key for key, value in shares.items() if value == currentPath][0]
else:
#see if we can find a partial match
# favor longest path
best = ""
pathLength = 0
for share, path in shares.items():
# path can be empty due to IPC$ share
if path:
# Is the share even applicable?
if path in currentPath:
# Favor the non default/admin share (DriveLetter$)
if share.endswith('$'):
defaultShare = currentPath.replace(path[:-1], share)
else:
if len(path) > pathLength:
shareName = currentPath.replace(path[:-1], share)

# No other share was found
if (defaultShare and not shareName):
shareName = defaultShare
x = os.path.join(r"\\" + os.environ['COMPUTERNAME'], shareName)
print("Current folder {} maps to {}".format(currentPath, x))

return os.path.join(r"\\" + os.environ['COMPUTERNAME'], shareName)


def main():

current = os.getcwd()
#current = "C:\WINDOWS"
share = translate_local_path_to_share(current)
print("{} aka {}".format(current, share))

if __name__ == "__main__":
main()

Takeaways

You probably won't ever need all of the above code to be able to swap out a local path for a network share using python but by golly if you do, have fun. Also, python is still my most favorite language, 14 years running.

Viewing all 144 articles
Browse latest View live