Quantcast
Channel: World of Whatever
Viewing all 144 articles
Browse latest View live

Variable scoping in TSQL isn't a thing

$
0
0

It's a pop quiz kind of day: run the code through your mental parser.


BEGIN TRY
DECLARE @foo varchar(30) = 'Created in try block';
DECLARE @i int = 1 / 0;
END TRY
BEGIN CATCH
PRINT @foo;
SET @foo = 'Catch found';
END CATCH;

PRINT @foo;
It won't compile since @foo goes out of scope for both the catch and the final line
It won't compile since @foo goes out of scope for the final line
It prints "Created in try block" and then "Catch found"
I am too fixated on your form not having a submit button

Crazy enough, the last two are correct. It seems that unlike every other language I've worked with, all variables are scoped to the same local scope regardless of where in the script they are defined. Demo the first

Wanna see something even more crazy? Check this version out


BEGIN TRY
DECLARE @i int = 1 / 0;
DECLARE @foo varchar(30) = 'Created in try block';
END TRY
BEGIN CATCH
PRINT @foo;
SET @foo = 'Catch found';
END CATCH;

PRINT @foo;

As above, the scoping of variables remains the same but the forced divide by zero error occurs before the declaration and initialization of our variable @foo. The result? @foo remains uninitialized as evidenced by the first print in the Catch block but it still exists/was parsed to instantiate the variable but not so the value assignment. Second demo

What's all this mean? SQL's weird.


Rename default constraints

$
0
0
This week I'm dealing with synchronizing tables between environments and it seems that regardless of what tool I'm using for schema compare, it still gets hung up on the differences in default names for constraints. Rather than fight that battle, I figured it'd greatly simplify my life to systematically rename all my constraints to non default names. The naming convention I went with is DF__schema name_table name_column name. I know that my schemas/tables/columns don't have spaces or "weird" characters in them so this works for me. Use this as your own risk and if you are using pre-2012 the CONCAT call will need to be adjusted to classic string concatenation, a.k.a. +

DECLARE @query nvarchar(4000);
DECLARE
CSR CURSOR
FAST_FORWARD
FOR
SELECT
CONCAT('ALTER TABLE ', QUOTENAME(S.name), '.', QUOTENAME(T.name), ' DROP CONSTRAINT [', DC.name, '];', CHAR(10)
, 'ALTER TABLE ', QUOTENAME(S.name), '.', QUOTENAME(T.name)
, ' ADD CONSTRAINT [', 'DF__', (S.name), '_', (T.name), '_', C.name, ']'
, ' DEFAULT ', DC.definition, ' FOR ', QUOTENAME(C.name)) AS Query
FROM
sys.schemas AS S
INNERJOIN
sys.tables AS T
ON T.schema_id = S.schema_id
INNERJOIN
sys.columns AS C
ON C.object_id = T.object_id
INNERJOIN
sys.default_constraints AS DC
ON DC.parent_object_id = T.object_id
AND DC.object_id = C.default_object_id
WHERE
DC.name LIKE'DF__%'
AND DC.name <> CONCAT('DF__', (S.name), '_', (T.name), '_', C.name);

OPEN CSR
FETCHNEXTFROM CSR INTO @query;
WHILE@@FETCH_STATUS = 0
BEGIN
BEGIN TRY
EXECUTE sys.sp_executesql @query, N'';
END TRY
BEGIN CATCH
PRINT ERROR_MESSAGE()
PRINT @query;
END CATCH
FETCHNEXTFROM CSR INTO @query;
END
CLOSE CSR;
DEALLOCATE CSR;

Generate TSQL time slices

$
0
0
I had some log data I wanted to bucket into 15 second time slices and I figured if I have solved this once, I will need to do it again so to the blog machine! This will use the LEAD, TIMEFROMPARTS and ROW_NUMBER() to accomplish this.

SELECT
D.Slice AS SliceStart
, LEAD
(
D.Slice
, 1
-- Default to midnight
, TIMEFROMPARTS(0,0,0,0,0)
)
OVER (ORDERBY D.Slice) AS SliceStop
, ROW_NUMBER() OVER (ORDERBY D.Slice) AS SliceLabel
FROM
(
-- Generate 15 second time slices
SELECT
TIMEFROMPARTS(A.rn, B.rn, C.rn, 0, 0) AS Slice
FROM
(SELECTTOP (24) -1 + ROW_NUMBER() OVER (ORDERBY(SELECTNULL)) FROM sys.all_objects AS AO) AS A(rn)
CROSS APPLY (SELECTTOP (60) (-1 + ROW_NUMBER() OVER (ORDERBY(SELECTNULL))) FROM sys.all_objects AS AO) AS B(rn)
-- 4 values since we'll aggregate to 15 seconds
CROSS APPLY (SELECTTOP (4) (-1 + ROW_NUMBER() OVER (ORDERBY(SELECTNULL))) * 15 FROM sys.all_objects AS AO) AS C(rn)
) D

That looks like a lot, but it really isn't. Starting from the first inner most query, we select the top 24 rows from sys.all_objects and use the ROW_NUMBER function to generate us a monotonically increasing set of values, thus 1...24. However, since the allowable range of hours is 0 to 23, I deduct one from this value (A). I repeat this pattern to generate minutes (B) except we get the top 60. Since I want 15 second intervals, for the seconds query, I only get the top 4 values. I deduct one so we have {0,1,2,3} and then multiply by 15 to get my increments (C). If you want different time slices, that's how I would modify this pattern.

Finally having 3 columns of numbers, I use TIMEFROMPARTS to build a time data type with the least amount of precision and present that as "Slice" and encapsulate that a derived table (D). Running that query gets me a list of periods but I don't know what the end period is.

We can calculate the end period by using the LEAD function. I present my original Slice as SliceStart. I then use the LEAD function to calculate the next (1) value based on the Slice column. In the case of 23:59:45, the "next" value in our data set is NULL. To address that scenario, we pass in a the default value for the lead function.

Biml build Database collection nodes

$
0
0

Biml build Database collection nodes aka what good are Annotations

In many of the walkthroughs on creating relational objects via Biml, it seems like people skim over the Databases collection. There's nothing built into the language to really support the creation of database nodes. The import database operations are focused on tables and schemas and assume the database node(s) have been created. I hate assumptions.

Connections.biml

Add a biml file to your project to house your connection information. I'm going to create two connections to the same Adventureworks 2014 database.


<Bimlxmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnectionName="Aw2014"ConnectionString="Provider=SQLNCLI11;Server=localhost\dev2016;Initial Catalog=AdventureWorks2014;Integrated Security=SSPI;"/>
<OleDbConnectionName="Aw2014Annotated"ConnectionString="Provider=SQLNCLI11;Server=localhost\dev2016;Initial Catalog=AdventureWorks2014;Integrated Security=SSPI;">
<Annotations>
<AnnotationTag="DatabaseName">AdventureWorks2014</Annotation>
<AnnotationTag="ServerName">localhost\dev2016</Annotation>
<AnnotationTag="Provider">SQLNCLI11</Annotation>
</Annotations>
</OleDbConnection>
</Connections>
</Biml>

The only material difference between the first and second OleDbConnection node is the declaration of Annotations on the second instance. Annotations are free form entities that allow you to enrich your Biml with more metadata. Here I've duplicated the DatabaseName, ServerName and Provider properties from my connection string. As you'll see in the upcoming BimlScript, prior proper planning prevents poor performance.

Databases.biml

The Databases node is a collection of AstDatabaseNode. They require a Name and a ConnectionName to be valid. Let's look at how we can construct these nodes based on information in our project. The first question I'd ask is what metadata do I readily have available? My Connections collection - that is already built out so I can enumerate through the items in there to populate the value of the ConnectionName. The only remaining item then is the Name for our database. I can see three ways of populating it: parsing the connection string for the database name, instantiating the connection manager and querying the database name from the RDBMS, or pulling the database name from our Annotations collection.

The basic approach would take the form


<Databases>
<#
string databaseName = string.Empty;

foreach(AstOleDbConnectionNode conn inthis.RootNode.OleDbConnections)
{
databaseName = "unknown";
// Logic goes here!
#>
<Database Name="<#= conn.Name #>.<#= databaseName#>" ConnectionName="<#= conn.Name #>" />
<#
}
#>
</Databases>

The logic we stuff in there can be as simple or complex as needed but the end result would be a well formed Database node.

Parsing

Connection strings are delimited (semicolon) key value pairs unique to the connection type. In this approach we'll split the connection string by semicolon and then split each resulting entity by the equals sign.


// use string parsing and prayer
try
{
string [] kVpish;
KeyValuePair<string, string> kvp ;
// This is probably the most fragile as it needs to take into consideration all the
// possible connection styles. Get well acquainted with ConnectionStrings.com
// Split our connnection string based on the delimiter of a semicolon
foreach (var element in conn.ConnectionString.Split(newchar [] {';'}))
{
kVpish = element.Split(newchar[]{'='});
if(kVpish.Count() > 1)
{
kvp = new KeyValuePair<string, string>(kVpish[0], kVpish[1]);
if (String.Compare("Initial Catalog", kvp.Key, StringComparison.InvariantCultureIgnoreCase) == 0)
{
databaseName = kvp.Value;
}
}
}
}
catch (Exception ex)
{
databaseName = string.Format("{0}_{1}", "Error", ex.ToString());
}

The challenge with this approach is that it's a fragile approach. As ConnectionStrings.com can attest, there are a lot of ways of constructing a connection string and what it denotes as the database name property.

Query

Another approach would be to instantiate the connection manager and then query the information schema equivalent and ask the database what the database name is. SQL Server makes this easy


// Or we can use database connectivity
// This query would need to have intelligence built into it to issue correct statement
// per database. This works for SQL Server only
string queryDatabaseName = @"SELECT db_name(db_id()) AS CurrentDB;";
System.Data.DataTable dt = null;
dt = ExternalDataAccess.GetDataTable(conn, queryDatabaseName);
foreach (System.Data.DataRow row in dt.Rows)
{
databaseName = row[0].ToString();
}

The downside to this is that, much like parsing the connection string it's going to be provider specific. Plus, this will be the slowest due to the cost of instantiating connections. Not to mention ensuring the build server has all the correct drivers installed.

Annotations

"Prior proper planning prevents poor performance" so let's spend a moment up front to define our metadata and then use Linq to extract that information. Since we can't guarantee that a connection node has an Annotation tag named DatabaseName, we need to test for the existence (Any) and if we find one, we'll extract the value.


if (conn.Annotations.Any(an => an.Tag=="DatabaseName"))
{
// We use the Select method to pull out only the thing, Text, that we are interested in
databaseName = conn.Annotations.Where(an => an.Tag=="DatabaseName").Select(t => t.Text).First().ToString();
}

The downside to this approach is that it requires planning and a bit of double entry as you need to keep the metadata (annotations) synchronized with the actual connection string. But since we're automating kind of people, that shouldn't be a problem...

Databases.biml

Putting it all together, our Databases.biml file becomes


<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<#@ template tier="10" #>
<Databases>
<#
foreach(AstOleDbConnectionNode conn inthis.RootNode.OleDbConnections)
{
string databaseName = "unknown";

// Test whether the annotation collection contains a tag named DatabaseName
if (conn.Annotations.Any(an => an.Tag=="DatabaseName"))
{
// We use the Select method to pull out only the thing, Text, that we are interested in
databaseName = conn.Annotations.Where(an => an.Tag=="DatabaseName").Select(t => t.Text).First().ToString();
}
else
{
// No annotations found
bool useStringParsing = true;
if (useStringParsing)
{
// use string parsing and prayer
try
{
string [] kVpish;
KeyValuePair<string, string> kvp ;
// This is probably the most fragile as it needs to take into consideration all the
// possible connection styles. Get well acquainted with ConnectionStrings.com
// Split our connnection string based on the delimiter of a semicolon
foreach (var element in conn.ConnectionString.Split(newchar [] {';'}))
{
kVpish = element.Split(newchar[]{'='});
if(kVpish.Count() > 1)
{
kvp = new KeyValuePair<string, string>(kVpish[0], kVpish[1]);
if (String.Compare("Initial Catalog", kvp.Key, StringComparison.InvariantCultureIgnoreCase) == 0)
{
databaseName = kvp.Value;
}
}
}
}
catch (Exception ex)
{
databaseName = string.Format("{0}_{1}", "Error", ex.ToString());
}
}
else
{
// Or we can use database connectivity
// This query would need to have intelligence built into it to issue correct statement
// per database. This works for SQL Server only
string queryDatabaseName = @"";
System.Data.DataTable dt = null;
dt = ExternalDataAccess.GetDataTable(conn, queryDatabaseName);
foreach (System.Data.DataRow row in dt.Rows)
{
databaseName = row[0].ToString();
}
}

}

#>
<Database Name="<#= conn.Name #>.<#= databaseName#>" ConnectionName="<#= conn.Name #>" />
<#
}
#>
</Databases>
</Biml>

And that's what it takes to use BimlScript to build out the Database collection nodes based on Annotation, string paring or database querying. Use Annotations to enrich your Biml objects with good metadata and then you can use it to simplify future operations.

Python pyinstaller erroring with takes 4 positional arguments but 5 were given

$
0
0

Pyinstaller is a program for turning python files into executable programs. This is helpful as it removes the requirement for having the python interpreter installed on a target computer. What was really weird was I could generate a multi-file package (pyinstaller .\MyFile.py) but not a onefile version.

C:\tmp>pyinstaller -onefile .\MyFile.py

Traceback (most recent call last):
File "C:\Program Files (x86)\Python35-32\Scripts\pyinstaller-script.py", line 9, in
load_entry_point('PyInstaller==3.2.1', 'console_scripts', 'pyinstaller')()
File "c:\program files (x86)\python35-32\lib\site-packages\PyInstaller\__main__.py", line 73, in run
args = parser.parse_args(pyi_args)
File "c:\program files (x86)\python35-32\lib\argparse.py", line 1727, in parse_args
args, argv = self.parse_known_args(args, namespace)
File "c:\program files (x86)\python35-32\lib\argparse.py", line 1759, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "c:\program files (x86)\python35-32\lib\argparse.py", line 1967, in _parse_known_args
start_index = consume_optional(start_index)
File "c:\program files (x86)\python35-32\lib\argparse.py", line 1907, in consume_optional
take_action(action, args, option_string)
File "c:\program files (x86)\python35-32\lib\argparse.py", line 1835, in take_action
action(self, namespace, argument_values, option_string)
TypeError: __call__() takes 4 positional arguments but 5 were given

What's the root cause? The argument is --onefile not -onefile.

SQL Server Query Metadata

$
0
0

SQL Server Query Metadata

Pop quiz, how you determine the metadata of a query in SQL Server? For a table, you can query the sys.schemas/sys.tables/sys.columns tables but a query? You might start pulling the query apart and looking up each column and its metadata but then you have to factor in function calls and suddenly, you're writing a parser within your query and you have an infinite recursion error.

But, if you're on SQL Server 2012+, you have a friend in sys.dm_exec_describe_first_result_set.

Let's start with a random query from Glen Berry's diagnostic query set


-- Drive level latency information (Query 28) (Drive Level Latency)
-- Based on code from Jimmy May
SELECT tab.[Drive], tab.volume_mount_point AS [Volume Mount Point],
CASE
WHEN num_of_reads = 0 THEN 0
ELSE (io_stall_read_ms/num_of_reads)
ENDAS [Read Latency],
CASE
WHEN num_of_writes = 0 THEN 0
ELSE (io_stall_write_ms/num_of_writes)
ENDAS [Write Latency],
CASE
WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0
ELSE (io_stall/(num_of_reads + num_of_writes))
ENDAS [Overall Latency],
CASE
WHEN num_of_reads = 0 THEN 0
ELSE (num_of_bytes_read/num_of_reads)
ENDAS [Avg Bytes/Read],
CASE
WHEN num_of_writes = 0 THEN 0
ELSE (num_of_bytes_written/num_of_writes)
ENDAS [Avg Bytes/Write],
CASE
WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0
ELSE ((num_of_bytes_read + num_of_bytes_written)/(num_of_reads + num_of_writes))
ENDAS [Avg Bytes/Transfer]
FROM (SELECTLEFT(UPPER(mf.physical_name), 2) AS Drive, SUM(num_of_reads) AS num_of_reads,
SUM(io_stall_read_ms) AS io_stall_read_ms, SUM(num_of_writes) AS num_of_writes,
SUM(io_stall_write_ms) AS io_stall_write_ms, SUM(num_of_bytes_read) AS num_of_bytes_read,
SUM(num_of_bytes_written) AS num_of_bytes_written, SUM(io_stall) AS io_stall, vs.volume_mount_point
FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs
INNERJOIN sys.master_files AS mf WITH (NOLOCK)
ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id
CROSS APPLY sys.dm_os_volume_stats(mf.database_id, mf.[file_id]) AS vs
GROUPBYLEFT(UPPER(mf.physical_name), 2), vs.volume_mount_point) AS tab
ORDERBY [Overall Latency] OPTION (RECOMPILE);

DriveVolume Mount PointRead LatencyWrite LatencyOverall LatencyAvg Bytes/ReadAvg Bytes/WriteAvg Bytes/Transfer
C:C:\00064447449331990

The results of the query aren't exciting, but what are the columns and expected data types? Pre-2012, most people dump the query results into a table with an impossible filter like WHERE 1=2 and then query the above system tables.

With the power of SQL Server 2012+, let's see what we can do. I'm going to pass in as the first argument our query and specify NULL for the next two parameters.


SELECT
DEDFRS.column_ordinal
, DEDFRS.name
, DEDFRS.is_nullable
, DEDFRS.system_type_name
, DEDFRS.max_length
, DEDFRS.precision
, DEDFRS.scale
FROM
sys.dm_exec_describe_first_result_set(N'
SELECT tab.[Drive], tab.volume_mount_point AS [Volume Mount Point],
CASE
WHEN num_of_reads = 0 THEN 0
ELSE (io_stall_read_ms/num_of_reads)
END AS [Read Latency],
CASE
WHEN num_of_writes = 0 THEN 0
ELSE (io_stall_write_ms/num_of_writes)
END AS [Write Latency],
CASE
WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0
ELSE (io_stall/(num_of_reads + num_of_writes))
END AS [Overall Latency],
CASE
WHEN num_of_reads = 0 THEN 0
ELSE (num_of_bytes_read/num_of_reads)
END AS [Avg Bytes/Read],
CASE
WHEN num_of_writes = 0 THEN 0
ELSE (num_of_bytes_written/num_of_writes)
END AS [Avg Bytes/Write],
CASE
WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0
ELSE ((num_of_bytes_read + num_of_bytes_written)/(num_of_reads + num_of_writes))
END AS [Avg Bytes/Transfer]
FROM (SELECT LEFT(UPPER(mf.physical_name), 2) AS Drive, SUM(num_of_reads) AS num_of_reads,
SUM(io_stall_read_ms) AS io_stall_read_ms, SUM(num_of_writes) AS num_of_writes,
SUM(io_stall_write_ms) AS io_stall_write_ms, SUM(num_of_bytes_read) AS num_of_bytes_read,
SUM(num_of_bytes_written) AS num_of_bytes_written, SUM(io_stall) AS io_stall, vs.volume_mount_point
FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs
INNER JOIN sys.master_files AS mf WITH (NOLOCK)
ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id
CROSS APPLY sys.dm_os_volume_stats(mf.database_id, mf.[file_id]) AS vs
GROUP BY LEFT(UPPER(mf.physical_name), 2), vs.volume_mount_point) AS tab
ORDER BY [Overall Latency] OPTION (RECOMPILE);'
, NULL, NULL) AS DEDFRS;

Look at our results. Now you can see the column names from our query, their basic type and whether they're nullable. That's pretty freaking handy.

column_ordinalnameis_nullablesystem_type_namemax_lengthprecisionscale
1Drive1nvarchar(2)400
2Volume Mount Point1nvarchar(256)51200
3Read Latency1bigint8190
4Write Latency1bigint8190
5Overall Latency1bigint8190
6Avg Bytes/Read1bigint8190
7Avg Bytes/Write1bigint8190
8Avg Bytes/Transfer1bigint8190

I'm thinking that I can use this technique against an arbitrary source of queries to build out the result tables and then ETL data into them. That should simplify my staging step for table loads. What can you use this for? Add links in the comments to how you use sys.dm_exec_describe_first_result_set

Biml Query Table Builder

$
0
0

Biml Query Table Builder

We previously noted the awesomeness that is SQL Server 2012+'s ability to expose a query's metadata. Let's look how we can couple that information with creating Biml Table objects.

Prep work

Add a static Biml file to your project that defines and OLE DB Connection and then a database and schema. e.g.


<Bimlxmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnectionName="msdb"ConnectionString="Provider=SQLNCLI11;Data Source=localhost\dev2016;Integrated Security=SSPI;Initial Catalog=msdb"/>
</Connections>
<Databases>
<DatabaseName="msdb"ConnectionName="msdb"/>
</Databases>
<Schemas>
<SchemaName="dbo"DatabaseName="msdb"/>
</Schemas>
</Biml>

Now that we have a database connection named msdb and a valid database and schema, save the file and let's get into the good stuff.

Given the reference query in previous post, Drive level latency information, we would need to declare the following Biml within our Tables collection.


<TableName="Query 28"SchemaName="msdb.dbo">
<Columns>
<ColumnName="Drive"DataType="String"Length="4"Precision="0"Scale="0"IsNullable="true"/>
<ColumnName="Volume Mount Point"DataType="String"Length="512"Precision="0"Scale="0"IsNullable="true"/>
<ColumnName="Read Latency"DataType="Int64"Length="8"Precision="19"Scale="0"IsNullable="true"/>
<ColumnName="Write Latency"DataType="Int64"Length="8"Precision="19"Scale="0"IsNullable="true"/>
<ColumnName="Overall Latency"DataType="Int64"Length="8"Precision="19"Scale="0"IsNullable="true"/>
<ColumnName="Avg Bytes/Read"DataType="Int64"Length="8"Precision="19"Scale="0"IsNullable="true"/>
<ColumnName="Avg Bytes/Write"DataType="Int64"Length="8"Precision="19"Scale="0"IsNullable="true"/>
<ColumnName="Avg Bytes/Transfer"DataType="Int64"Length="8"Precision="19"Scale="0"IsNullable="true"/>
</Columns>
</Table>

That could be accomplished purely within the declarative nature of Biml wherein we do lots of text nuggets <#= "foo" #> but that's going to be ugly to maintain as there's a lot of conditional logic to muck with. Instead, I'm going to create a C# method that returns the the Biml table object (AstTableNode). To do that, we will need to create a Biml Class nugget <#+ #>. I ended up creating two methods: GetAstTableNodeFromQuery and then a helper method to translate the SQL Server data types into something Biml understood.


<#+
/// <summary>
/// Build out a Biml table based on the supplied query and connection.
/// This assumes a valid SQL Server 2012+ OLEDB Connection is provided but the approach
/// can be adapted based on providers and information schemas.
/// We further assume that column names in the query are unique.
/// </summary>
/// <param name="connection">An OleDbConnection</param>
/// <param name="query">A SQL query</param>
/// <param name="schemaName">The schema our table should be created in</param>
/// <param name="queryName">A name for our query</param>
/// <returns>Best approximation of a SQL Server data type</returns>
public AstTableNode GetAstTableNodeFromQuery(AstOleDbConnectionNode connection, string query, string schemaName, string queryName)
{
string template = @"SELECT
DEDFRS.name
, DEDFRS.is_nullable
, DEDFRS.system_type_name
, DEDFRS.max_length
, DEDFRS.precision
, DEDFRS.scale
FROM
sys.dm_exec_describe_first_result_set(N'{0}', NULL, NULL) AS DEDFRS ORDER BY DEDFRS.column_ordinal;"
;
AstTableNode atn = null;

atn = new AstTableNode(null);
atn.Name = queryName;
atn.Schema = this.RootNode.Schemas[schemaName];
string queryActual = string.Format(template, query.Replace("'", "''"));

string colName = string.Empty;
string typeText = string.Empty;
System.Data.DbType dbt = DbType.UInt16;
int length = 0;
int precision = 0;
int scale = 0;

try
{
System.Data.DataTable dt = null;
dt = ExternalDataAccess.GetDataTable(connection, queryActual);
foreach (System.Data.DataRow row in dt.Rows)
{
try
{
AstTableColumnBaseNode col = new AstTableColumnNode(atn);
// This can be empty -- see DBCC TRACESTATUS (-1)
if(row[0] == DBNull.Value)
{
atn.Annotations.Add(new AstAnnotationNode(atn){Tag = "Invalid", Text = "No Metadata generated"});
break;
}
else
{
colName = row[0].ToString();
}

typeText = row[2].ToString();
dbt = TranslateSqlServerTypes(row[2].ToString());
length = int.Parse(row[3].ToString());
precision = int.Parse(row[4].ToString());
scale = int.Parse(row[5].ToString());

col.Name = colName;
col.IsNullable = (bool)row[1];
col.DataType = dbt;
col.Length = length;
col.Precision = precision;
col.Scale = scale;

atn.Columns.Add(col);
}
catch (Exception ex)
{
// Something went awry with making a column for our table
AstTableColumnBaseNode col = new AstTableColumnNode(atn);
col.Name = "FailureColumn";
col.Annotations.Add(new AstAnnotationNode(col){Tag = "colName", Text = colName});
col.Annotations.Add(new AstAnnotationNode(col){Tag = "typeText", Text = typeText});
col.Annotations.Add(new AstAnnotationNode(col){Tag = "dbtype", Text = dbt.ToString()});
col.Annotations.Add(new AstAnnotationNode(col){Tag = "Error", Text = ex.Message});
col.Annotations.Add(new AstAnnotationNode(col){Tag = "Stack", Text = ex.StackTrace});
atn.Columns.Add(col);
}
}
}
catch (Exception ex)
{
// Table level failures
AstTableColumnBaseNode col = new AstTableColumnNode(atn);
col.Name = "Failure";
col.Annotations.Add(new AstAnnotationNode(col){Tag = "Error", Text = ex.ToString()});
col.Annotations.Add(new AstAnnotationNode(col){Tag = "SourceQuery", Text = query});
col.Annotations.Add(new AstAnnotationNode(col){Tag = "QueryActual", Text = queryActual});
atn.Columns.Add(col);
}
return atn;
}

/// <summary>
/// A rudimentary method to convert SQL Server data types to Biml types. Doesn't cover
/// UDDT, sql_variant(well)
/// </summary>
/// <param name="typeName">Data type with optional length/scale/precision</param>
/// <returns>Best approximation of a SQL Server data type</returns>
public DbType TranslateSqlServerTypes(string typeName)
{
// typeName might contain length - strip it
string fixedName = typeName;
if(typeName.Contains("("))
{
fixedName = typeName.Substring(0, typeName.IndexOf("("));
}
// Approximate translation of https://msdn.microsoft.com/en-us/library/System.Data.DbType.aspx
// https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/sql-server-data-type-mappings
Dictionary<string, DbType> translate = new Dictionary<string, DbType> {

{"bigint", DbType.Int64 }
, {"binary", DbType.Binary }
, {"bit", DbType.Boolean }
, {"char", DbType.AnsiStringFixedLength }
, {"date", DbType.Date }
, {"datetime", DbType.DateTime }
, {"datetime2", DbType.DateTime2 }
, {"datetimeoffset", DbType.DateTimeOffset }
, {"decimal", DbType.Decimal }
, {"float", DbType.Double }
//, {"geography",
//, {"geometry",
//, {"hierarchyid",
, {"image", DbType.Binary }
, {"int", DbType.Int32 }
, {"money", DbType.Decimal }
, {"nchar", DbType.StringFixedLength }
, {"ntext", DbType.String }
, {"numeric", DbType.Decimal }
, {"nvarchar", DbType.String }
, {"real", DbType.Single }
, {"smalldatetime", DbType.DateTime }
, {"smallint", DbType.Int16 }
, {"smallmoney", DbType.Decimal }
, {"sql_variant", DbType.Object }
, {"sysname", DbType.String }
, {"text", DbType.String }
, {"time", DbType.Time }
, {"timestamp", DbType.Binary }
, {"tinyint", DbType.Byte }
, {"uniqueidentifier", DbType.Guid }
, {"varbinary", DbType.Binary }
, {"varchar", DbType.AnsiString }
, {"xml", DbType.Xml }
};

try
{
return translate[fixedName];
}
catch
{
return System.Data.DbType.UInt64;
}
}
<#+

Good grief, that's a lot of code, how do I use it? The basic usage would be something like


<Tables>
<#= GetAstTableNodeFromQuery(this.RootNode.OleDbConnections["msdb"], "SELECT 100 AS demo", "dbo", "DemoQuery").GetBiml() #>
</Tables>

The call to GetAstTableNodeFromQuery return an AstTableNode which is great, but what we really want is the Biml behind it so we chain a call to .GetBiml() onto the end.

What would make that better though is to make it a little more dynamic. Let's improve the code to create tables based on a pairs of names and queries. I'm going to use a Dictionary called namedQueries to hold the names and queries and then enumerate through them, calling our GetAstTableNodeFromQuery for each entry.


<#
Dictionary<string, string> namedQueries = new Dictionary<string,string>{{"Query 28", @"-- Drive level latency information (Query 28) (Drive Level Latency)
-- Based on code from Jimmy May
SELECT tab.[Drive], tab.volume_mount_point AS [Volume Mount Point],
CASE
WHEN num_of_reads = 0 THEN 0
ELSE (io_stall_read_ms/num_of_reads)
END AS [Read Latency],
CASE
WHEN num_of_writes = 0 THEN 0
ELSE (io_stall_write_ms/num_of_writes)
END AS [Write Latency],
CASE
WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0
ELSE (io_stall/(num_of_reads + num_of_writes))
END AS [Overall Latency],
CASE
WHEN num_of_reads = 0 THEN 0
ELSE (num_of_bytes_read/num_of_reads)
END AS [Avg Bytes/Read],
CASE
WHEN num_of_writes = 0 THEN 0
ELSE (num_of_bytes_written/num_of_writes)
END AS [Avg Bytes/Write],
CASE
WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0
ELSE ((num_of_bytes_read + num_of_bytes_written)/(num_of_reads + num_of_writes))
END AS [Avg Bytes/Transfer]
FROM (SELECT LEFT(UPPER(mf.physical_name), 2) AS Drive, SUM(num_of_reads) AS num_of_reads,
SUM(io_stall_read_ms) AS io_stall_read_ms, SUM(num_of_writes) AS num_of_writes,
SUM(io_stall_write_ms) AS io_stall_write_ms, SUM(num_of_bytes_read) AS num_of_bytes_read,
SUM(num_of_bytes_written) AS num_of_bytes_written, SUM(io_stall) AS io_stall, vs.volume_mount_point
FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs
INNER JOIN sys.master_files AS mf WITH (NOLOCK)
ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id
CROSS APPLY sys.dm_os_volume_stats(mf.database_id, mf.[file_id]) AS vs
GROUP BY LEFT(UPPER(mf.physical_name), 2), vs.volume_mount_point) AS tab
ORDER BY [Overall Latency] OPTION (RECOMPILE);"
}};
#>
<Tables>
<# foreach(var kvp in namedQueries){ #>
<#= GetAstTableNodeFromQuery(this.RootNode.OleDbConnections["msdb"], kvp.Value, "dbo", kvp.Key).GetBiml() #>
<# } #>
</Tables>

How can we improve this? Let's get rid of the hard coded query names and actual queries. Tune in to the next installment to see how we'll make that work.

Full code is over on github

Broken View Finder

$
0
0

Broken View Finder

Shh, shhhhhh, we're being very very quiet, we're hunting broken views. Recently, we were asked to migrate some code changes and after doing so, the requesting team told us we had broken all of their views, but they couldn't tell us what was broken, just that everything was. After a quick rollback to snapshot, thank you Red Gate SQL Compare, I thought it'd be enlightening to see whether anything was broken before our code had been deployed.

You'll never guess what we discovered </clickbain&grt;

How can you tell a view is broken

The easiest way is SELECT TOP 1 * FROM dbo.MyView; but then you need to figure out all of your views.

That's easy enough, SELECT * FROM sys.schemas AS S INNER JOIN sys.views AS V ON V.schema_id = S.schema_id;

But you know, there's something built into SQL Server that will actually test your views - sys.sp_refreshview. That's much cleaner than running sys.sp_executesql with our SELECT TOP 1s


-- This script identifies broken views
-- and at least the first error with it
SET NOCOUNT ON;
DECLARE
CSR CURSOR
FAST_FORWARD
FOR
SELECT
CONCAT(QUOTENAME(S.name), '.', QUOTENAME(V.name)) AS vname
FROM
sys.views AS V
INNERJOIN
sys.schemas AS S
ON S.schema_id = V.schema_id;

DECLARE
@viewname nvarchar(776);
DECLARE
@BROKENVIEWS table
(
viewname nvarchar(776)
, ErrorMessage nvarchar(4000)
, ErrorLine int
);

OPEN
CSR;
FETCH
NEXTFROM CSR INTO @viewname;

WHILE
@@FETCH_STATUS = 0
BEGIN

BEGIN TRY
EXECUTE sys.sp_refreshview
@viewname;
END TRY
BEGIN CATCH
INSERTINTO @BROKENVIEWS(viewname, ErrorMessage, ErrorLine)
VALUES
(
@viewname
, ERROR_MESSAGE()
, ERROR_LINE()
);

END CATCH

FETCH
NEXTFROM CSR INTO @viewname;
END

CLOSE CSR;
DEALLOCATE CSR;

SELECT
B.*
FROM
@BROKENVIEWS AS B

Can you think of ways to improve this? Either way, happy hunting!


Temporal table maker

$
0
0

Temporal table maker

This post is another in the continuing theme of "making things consistent." We were voluntold to help another team get their staging environment set up. Piece of cake, SQL Compare made it trivial to snap the tables over.

Oh, we don't want these tables in Custom schema, we want them in dbo. No problem, SQL Compare again and change owner mappings and bam, out come all the tables.

Oh, can we get this in near real-time? Say every 15 minutes. ... Transaction replication to the rescue!

Oh, we don't know what data we need yet so could you keep it all, forever? ... Temporal tables to the rescue?

Yes, temporal tables is perfect. But don't put the history table in the same schema as the table, put in this one. And put all of that in its own file group.

And that's what this script does. It

  • generates a table definition for an existing table, copying it into a new schema while also adding in the start/stop columns for temporal tables.
  • crates the clustered column store index command
  • creates a non-clustered index against the start/stop columns and the natural key(s)
  • Alters the original table to add in our start/stop columns with defaults and the period
  • Alters the original table to turn on versioning

    How does it do all that? It finds all the tables that exist in our source schema and doesn't yet exist in the target schema. I build out a select * query against that table and feed it into sys.dm_exec_describe_first_result_set to identify the columns. And since sys.dm_exec_describe_first_result_set so nicely brings back the data type with length, precision and scale specified, we might as well use that as well. By specifying a value of 1 for browse_information_mode parameter, we will get the key columns defined for us. Which is handy when we want to make our non-clustered index.


    DECLARE
    @query nvarchar(4000)
    , @targetSchema sysname = 'dbo_HISTORY'
    , @tableName sysname
    , @targetFileGroup sysname = 'History'

    DECLARE
    CSR CURSOR
    FAST_FORWARD
    FOR
    SELECTALL
    CONCAT(
    'SELECT * FROM '
    , s.name
    , '.'
    , t.name)
    , t.name
    FROM
    sys.schemas AS S
    INNERJOIN sys.tables AS T
    ON T.schema_id = S.schema_id
    WHERE
    1=1
    AND S.name = 'dbo'
    AND T.name NOTIN
    (SELECT TI.name FROM sys.schemas AS SI INNERJOIN sys.tables AS TI ON TI.schema_id = SI.schema_id WHERE SI.name = @targetSchema)

    ;
    OPEN CSR;
    FETCHNEXTFROM CSR INTO @query, @tableName;
    WHILE@@FETCH_STATUS = 0
    BEGIN
    -- do something
    SELECT
    CONCAT
    (
    'CREATE TABLE '
    , @targetSchema
    , '.'
    , @tableName
    , '('
    , STUFF
    (
    (
    SELECT
    CONCAT
    (
    ','
    , DEDFRS.name
    , ''
    , DEDFRS.system_type_name
    , ''
    , CASE DEDFRS.is_nullable
    WHEN 1 THEN''
    ELSE'NOT '
    END
    , 'NULL'
    )
    FROM
    sys.dm_exec_describe_first_result_set(@query, N'', 1) AS DEDFRS
    ORDERBY
    DEDFRS.column_ordinal
    FOR XML PATH('')
    )
    , 1
    , 1
    , ''
    )
    , ', SysStartTime datetime2(7) NOT NULL'
    , ', SysEndTime datetime2(7) NOT NULL'
    , ')'
    , ' ON '
    , @targetFileGroup
    , ';'
    , CHAR(13)
    , 'CREATE CLUSTERED COLUMNSTORE INDEX CCI_'
    , @targetSchema
    , '_'
    , @tableName
    , ' ON '
    , @targetSchema
    , '.'
    , @tableName
    , ' ON '
    , @targetFileGroup
    , ';'
    , CHAR(13)
    , 'CREATE NONCLUSTERED INDEX IX_'
    , @targetSchema
    , '_'
    , @tableName
    , '_PERIOD_COLUMNS '
    , ' ON '
    , @targetSchema
    , '.'
    , @tableName

    , '('
    , 'SysEndTime'
    , ',SysStartTime'
    , (
    SELECT
    CONCAT
    (
    ','
    , DEDFRS.name
    )
    FROM
    sys.dm_exec_describe_first_result_set(@query, N'', 1) AS DEDFRS
    WHERE
    DEDFRS.is_part_of_unique_key = 1
    ORDERBY
    DEDFRS.column_ordinal
    FOR XML PATH('')
    )
    , ')'
    , ' ON '
    , @targetFileGroup
    , ';'
    , CHAR(13)
    , 'ALTER TABLE '
    , 'dbo'
    , '.'
    , @tableName
    , ' ADD '
    , 'SysStartTime datetime2(7) GENERATED ALWAYS AS ROW START HIDDEN'
    , ' CONSTRAINT DF_'
    , 'dbo_'
    , @tableName
    , '_SysStartTime DEFAULT SYSUTCDATETIME()'
    , ', SysEndTime datetime2(7) GENERATED ALWAYS AS ROW END HIDDEN'
    , ' CONSTRAINT DF_'
    , 'dbo_'
    , @tableName
    , '_SysEndTime DEFAULT DATETIME2FROMPARTS(9999, 12, 31, 23,59, 59,9999999,7)'
    , ', PERIOD FOR SYSTEM_TIME (SysStartTime, SysEndTime);'
    , CHAR(13)
    , 'ALTER TABLE '
    , 'dbo'
    , '.'
    , @tableName
    , ' SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = '
    , @targetSchema
    , '.'
    , @tableName
    , '));'

    )

    FETCHNEXTFROM CSR INTO @query, @tableName;
    END
    CLOSE CSR;
    DEALLOCATE CSR;

    Lessons learned

    The exampled I cobbled together from MSDN were great, until they weren't. Be wary of anyone who doesn't specify lengths - one example used datetime2 for the start/stop columns, the other specified datetime2(0). The default precision with datetime2 is 7, which is very much not 0. Those data types differences were incompatible for temporal table and history.

    Cleaning up from that mess was ugly. I couldn't drop the start/stop columns until I dropped the PERIOD column. One doesn't drop a PERIOD though, one has to DROP PERIOD FOR SYSTEM_TIME

    I prefer to use the *FromParts methods where I can so that's in my default instead of casting strings. Out ambiguity of internationalization!

    This doesn't account for tables with bad names and potentially without primary/unique keys defined. My domain was clean so beware of this a general purpose temporal table maker.

    Improvements

    How can you make this better? My hard coded dbo should have been abstracted out to a @sourceSchema variable. I should have used QUOTENAME for all my entity names. I could have stuffed all those commands into either a table or invoked it directly with a sp_execute_sql call. I should have abused CONCAT more Wait, that's done. That's very well done.

    Finally, you are responsible for the results of this script. Don't run it anywhere without evaluating and understanding the consequences.

  • What's my transaction isolation level

    $
    0
    0

    What's my transaction isolation level

    That's an easy question to answer - StackOverflow has a fine answer.

    But, what if I use sp_executesql to run some dynamic sql - does it default the connection isolation level? If I change isolation level within the query, does it propagate back to the invoker? That's a great question, William. Let's find out.


    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

    SELECT CASE transaction_isolation_level
    WHEN 0 THEN 'Unspecified'
    WHEN 1 THEN 'ReadUncommitted'
    WHEN 2 THEN 'ReadCommitted'
    WHEN 3 THEN 'Repeatable'
    WHEN 4 THEN 'Serializable'
    WHEN 5 THEN 'Snapshot' END AS TRANSACTION_ISOLATION_LEVEL
    FROM sys.dm_exec_sessions
    where session_id = @@SPID;

    DECLARE
    @query nvarchar(max) = N'-- Identify iso level
    SELECT CASE transaction_isolation_level
    WHEN 0 THEN ''Unspecified''
    WHEN 1 THEN ''ReadUncommitted''
    WHEN 2 THEN ''ReadCommitted''
    WHEN 3 THEN ''Repeatable''
    WHEN 4 THEN ''Serializable''
    WHEN 5 THEN ''Snapshot'' END AS TRANSACTION_ISOLATION_LEVEL
    FROM sys.dm_exec_sessions
    where session_id = @@SPID;

    SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

    -- Test iso level
    SELECT CASE transaction_isolation_level
    WHEN 0 THEN ''Unspecified''
    WHEN 1 THEN ''ReadUncommitted''
    WHEN 2 THEN ''ReadCommitted''
    WHEN 3 THEN ''Repeatable''
    WHEN 4 THEN ''Serializable''
    WHEN 5 THEN ''Snapshot'' END AS TRANSACTION_ISOLATION_LEVEL
    FROM sys.dm_exec_sessions
    where session_id = @@SPID'

    EXECUTE sys.sp_executesql @query, N'';

    SELECT CASE transaction_isolation_level
    WHEN 0 THEN 'Unspecified'
    WHEN 1 THEN 'ReadUncommitted'
    WHEN 2 THEN 'ReadCommitted'
    WHEN 3 THEN 'Repeatable'
    WHEN 4 THEN 'Serializable'
    WHEN 5 THEN 'Snapshot' END AS TRANSACTION_ISOLATION_LEVEL
    FROM sys.dm_exec_sessions
    where session_id = @@SPID;

    I begin my session in read uncommitted aka "nolock". I then run dynamic sql which identifies my isolation level, still read uncommitted, change it to a different level, confirmed at read committed, and then exit and check my final state - back to read uncommitted.

    Finally, thanks to Andrew Kelly (b|t) for answering the #sqlhelp call.

    Python Azure Function requestor's IP address

    $
    0
    0

    Python Azure Function requestor's IP address

    I'm working on an anonymous level Azure Function in python and couldn't find where they stored the IP address of the caller, if applicable. It's in the request headers, which makes sense but not until I spent far too much time looking in all the wrong places. A minimal reproduction would look something like


    import os
    iptag = "REQ_HEADERS_X-FORWARDED-FOR"
    ip = "Tag name:{} Tag value:{}".format(iptag, os.environ[iptag])
    print(ip)

    Now, something to note is that it will return not only the IP address but the port the call came in through. Thus, I see a value of 192.168.1.200:33496 instead of just the ipv4 value.

    Knowing where to look, I can see that the heavy lifting had already been done by the most excellent HTTPHelper but as a wise man once said: knowing is half the battle.


    import os
    from AzureHTTPHelper import HTTPHelper
    http = HTTPHelper()
    #Notice the lower casing of properties here and the trimming of the type (REQ_HEADERS)
    iptag = "x-forwarded-for"
    ip = "Tag name:{} Tag value:{}".format(iptag, http.headers[iptag])
    print(ip)

    Yo Joe!


    Staging Metadata Framework for the Unknown

    $
    0
    0

    Staging metadata framework for the unknown

    That's a terrible title but it's the best I got. A client would like to report out of ServiceNow some metrics not readily available in the PowerBI App. The first time I connected, I got a quick look at the Incidents and some of the data we'd be interested in but I have no idea how that data changes over time. When you first open a ticket, maybe it doesn't have a resolved date or a caused by field populated. And since this is all web service stuff and you can customize it, I knew I was looking at lots of iterations to try and keep up with all the data coming back from the service. How can I handle this and keep sane? Those were my two goals. I thought it'd be fun to share how I solved the problem using features in SQL Server 2016.

    To begin, I created a database called RTMA to perform my real time metrics analysis. CREATE DATABASE RTMA; With that done, I created a schema within my database like USE RTMA; GO CREATE SCHEMA ServiceNow AUTHORIZATION dbo; To begin, we need a table to hold our discovery metadata.


    CREATE TABLE
    ServiceNow.ColumnSizing
    (
    EntityName varchar(30) NOT NULL
    , CollectionName varchar(30) NOT NULL
    , ColumnName varchar(30) NOT NULL
    , ColumnLength int NOT NULL
    , InsertDate datetime NOT NULL
    CONSTRAINT DF_ServiceNow_ColumnSizing_InsertDate DEFAULT (GETDATE())
    );

    CREATE CLUSTERED COLUMNSTORE INDEX
    CCI_ServiceNow_ColumnSizing
    ON ServiceNow.ColumnSizing;
    The idea for this metadata table is that we'll just keep adding more information in for the entities we survey. All that matters is the largest length for a given combination of Entity, Collection, and Column.

    In the following demo, we'll add 2 rows into our table. The first batch will be our initial sizing and then "something" happens and we discover the size has increased.


    INSERT INTO
    ServiceNow.ColumnSizing
    (
    EntityName
    , CollectionName
    , ColumnName
    , ColumnLength
    , InsertDate
    )
    VALUES
    ('DoesNotExist', 'records', 'ABC', 10, current_timestamp)
    , ('DoesNotExist', 'records', 'BCD', 30, current_timestamp);

    Create a base table for our DoesNotExist. What columns will be available? I know I'll want my InsertDate and that's the only thing I'll guarantee to begin. And that's ok because we're going to get clever.


    DECLARE @entity nvarchar(30) = N'DoesNotExist'
    , @Template nvarchar(max) = N'DROP TABLE IF EXISTS ServiceNow.Stage;
    CREATE TABLE
    ServiceNow.Stage
    (

    InsertDate datetime CONSTRAINT DF_ServiceNow_Stage_InsertDate DEFAULT (GETDATE())
    );
    CREATE CLUSTERED COLUMNSTORE INDEX
    CCI_ServiceNow_Stage
    ON
    ServiceNow.Stage;'
    , @Columns nvarchar(max) = N'';

    DECLARE @Query nvarchar(max) = REPLACE(REPLACE(@Template, '', @Entity), '', @Columns);
    EXECUTE sys.sp_executesql @Query, N'';

    We now have a table with one column so let's look at using our synthetic metadata (ColumnSizing) to augment it. The important thing to understand in the next block of code is that we'll use FOR XML PATH('') to concatenate rows together and the CONCAT function to concatenate values together.

    See more here for the XML PATH "trick"

    If we're going to define columns for a table, it follows that we need to know what table needs what columns and what size those columns should be. So, let the following block be that definition.


    DECLARE @Entity varchar(30) = 'DoesNotExist';

    SELECT
    CS.EntityName
    , CS.CollectionName
    , CS.ColumnName
    , MAX(CS.ColumnLength) AS ColumnLength
    FROM
    ServiceNow.ColumnSizing AS CS
    WHERE
    CS.ColumnLength > 0
    AND CS.ColumnLength =
    (
    SELECT
    MAX(CSI.ColumnLength) AS ColumnLength
    FROM
    ServiceNow.ColumnSizing AS CSI
    WHERE
    CSI.EntityName = CS.EntityName
    AND CSI.ColumnName = CS.ColumnName
    )
    AND CS.EntityName = @Entity
    GROUP BY
    CS.EntityName
    , CS.CollectionName
    , CS.ColumnName;

    We run the above query and that looks like what we want so into the FOR XML machine it goes.

    DECLARE @Entity varchar(30) = 'DoesNotExist'
    , @ColumnSizeDeclaration varchar(max);

    ;WITH BASE_DATA AS
    (
    -- Define the base data we'll use to drive creation
    SELECT
    CS.EntityName
    , CS.CollectionName
    , CS.ColumnName
    , MAX(CS.ColumnLength) AS ColumnLength
    FROM
    ServiceNow.ColumnSizing AS CS
    WHERE
    CS.ColumnLength > 0
    AND CS.ColumnLength =
    (
    SELECT
    MAX(CSI.ColumnLength) AS ColumnLength
    FROM
    ServiceNow.ColumnSizing AS CSI
    WHERE
    CSI.EntityName = CS.EntityName
    AND CSI.ColumnName = CS.ColumnName
    )
    AND CS.EntityName = @Entity
    GROUP BY
    CS.EntityName
    , CS.CollectionName
    , CS.ColumnName
    )
    SELECT DISTINCT
    BD.EntityName
    , (
    SELECT
    CONCAT
    (
    ''
    , BDI.ColumnName
    , ' varchar('
    , BDI.ColumnLength
    , '),'
    )
    FROM
    BASE_DATA AS BDI
    WHERE
    BDI.EntityName = BD.EntityName
    AND BDI.CollectionName = BD.CollectionName
    FOR XML PATH('')
    ) AS ColumnSizeDeclaration
    FROM
    BASE_DATA AS BD;

    That looks like a lot, but it's not. Run it and you'll see we get one row with two elements: "DoesNotExist" and "ABC varchar(10),BCD varchar(30)," That trailing comma is going to be a problem, that's generally why you see people either a leading delimiter and use STUFF to remove it or in the case of a trailing delimiter LEFT with LEN -1 does the trick.

    But we're clever and don't need such tricks. If you look at the declaration for @Template, we assume there will *always* be at final column of InsertDate which didn't have a comma preceding it. Always define the rules to favor yourself. ;)

    Instead of the static table declaration we used, let's marry our common table expression, CTE, with the table template.


    DECLARE @entity nvarchar(30) = N'DoesNotExist'
    , @Template nvarchar(max) = N'DROP TABLE IF EXISTS ServiceNow.Stage;
    CREATE TABLE
    ServiceNow.Stage
    (

    InsertDate datetime CONSTRAINT DF_ServiceNow_Stage_InsertDate DEFAULT (GETDATE())
    );
    CREATE CLUSTERED COLUMNSTORE INDEX
    CCI_ServiceNow_Stage
    ON
    ServiceNow.Stage;'
    , @Columns nvarchar(max) = N'';

    -- CTE logic patched in here

    ;WITH BASE_DATA AS
    (
    -- Define the base data we'll use to drive creation
    SELECT
    CS.EntityName
    , CS.CollectionName
    , CS.ColumnName
    , MAX(CS.ColumnLength) AS ColumnLength
    FROM
    ServiceNow.ColumnSizing AS CS
    WHERE
    CS.ColumnLength > 0
    AND CS.ColumnLength =
    (
    SELECT
    MAX(CSI.ColumnLength) AS ColumnLength
    FROM
    ServiceNow.ColumnSizing AS CSI
    WHERE
    CSI.EntityName = CS.EntityName
    AND CSI.ColumnName = CS.ColumnName
    )
    AND CS.EntityName = @Entity
    GROUP BY
    CS.EntityName
    , CS.CollectionName
    , CS.ColumnName
    )
    SELECT DISTINCT
    @Columns = (
    SELECT
    CONCAT
    (
    ''
    , BDI.ColumnName
    , ' varchar('
    , BDI.ColumnLength
    , '),'
    )
    FROM
    BASE_DATA AS BDI
    WHERE
    BDI.EntityName = BD.EntityName
    AND BDI.CollectionName = BD.CollectionName
    FOR XML PATH('')
    )
    FROM
    BASE_DATA AS BD;

    DECLARE @Query nvarchar(max) = REPLACE(REPLACE(@Template, '', @Entity), '', @Columns);
    EXECUTE sys.sp_executesql @Query, N'';

    Bam, look at it now. We took advantage of the new DROP IF EXISTS (DIE) syntax to drop our table and we've redeclared it, nice as can be. Don't take my word for it though, ask the system tables what they see.


    SELECT
    S.name AS SchemaName
    , T.name AS TableName
    , C.name AS ColumnName
    , T2.name AS DataTypeName
    , C.max_length
    FROM
    sys.schemas AS S
    INNER JOIN
    sys.tables AS T
    ON T.schema_id = S.schema_id
    INNER JOIN
    sys.columns AS C
    ON C.object_id = T.object_id
    INNER JOIN
    sys.types AS T2
    ON T2.user_type_id = C.user_type_id
    WHERE
    S.name = 'ServiceNow'
    AND T.name = 'StageDoesNotExist'
    ORDER BY
    S.name
    , T.name
    , C.column_id;
    Excellent, we now turn on the actual data storage process and voila, we get a value stored into our table. Simulate it with the following.

    INSERT INTO ServiceNow.StageDoesNotExist
    (ABC, BCD) VALUES ('Important', 'Very, very important');
    Truly, all is well and good.

    *time passes*

    Then, this happens


    WAITFOR DELAY ('00:00:03');

    INSERT INTO
    ServiceNow.ColumnSizing
    (
    EntityName
    , CollectionName
    , ColumnName
    , ColumnLength
    , InsertDate
    )
    VALUES
    ('DoesNotExist', 'records', 'BCD', 34, current_timestamp);
    Followed by

    INSERT INTO ServiceNow.StageDoesNotExist
    (ABC, BCD) VALUES ('Important','Very important, yet ephemeral data');
    To quote Dr. Beckett: Oh boy

    What are all the functions and their parameters?

    $
    0
    0

    What are all the functions and their parameters?

    File this one under: I wrote it once, may I never need it again

    In my ever expanding quest for getting all the metadata, I how could I determine the metadata for all my table valued functions? No problem, that's what sys.dm_exec_describe_first_result_set is for. SELECT * FROM sys.dm_exec_describe_first_result_set(N'SELECT * FROM dbo.foo(@xmlMessage)', N'@xmlMessage nvarchar(max)', 1) AS DEDFRS

    Except, I need to know parameters. And I need to know parameter types. And order. Fortunately, sys.parameters and sys.types makes this easy. The only ugliness comes from the double invocation of row rollups



    SELECT
    CONCAT
    (
    ''
    , 'SELECT * FROM '
    , QUOTENAME(S.name)
    , '.'
    , QUOTENAME(O.name)
    , '('
    -- Parameters here without type
    , STUFF
    (
    (
    SELECT
    CONCAT
    (
    ''
    , ','
    , P.name
    , ''
    )
    FROM
    sys.parameters AS P
    WHERE
    P.is_output = CAST(0 AS bit)
    AND P.object_id = O.object_id
    ORDER BY
    P.parameter_id
    FOR XML PATH('')
    )
    , 1
    , 1
    , ''
    )

    , ') AS F;'
    ) AS SourceQuery
    , (
    STUFF
    (
    (
    SELECT
    CONCAT
    (
    ''
    , ','
    , P.name
    , ''
    , CASE
    WHEN T2.name LIKE '%char' THEN CONCAT(T2.name, '(', CASE P.max_length WHEN -1 THEN 'max' ELSE CAST(P.max_length AS varchar(4)) END, ')')
    WHEN T2.name = 'time' OR T2.name ='datetime2' THEN CONCAT(T2.name, '(', P.scale, ')')
    WHEN T2.name = 'numeric' THEN CONCAT(T2.name, '(', P.precision, ',', P.scale, ')')
    ELSE T2.name
    END
    )
    FROM
    sys.parameters AS P
    INNER JOIN
    sys.types AS T2
    ON T2.user_type_id = P.user_type_id
    WHERE
    P.is_output = CAST(0 AS bit)
    AND P.object_id = O.object_id
    ORDER BY
    P.parameter_id
    FOR XML PATH('')
    )
    , 1
    , 1
    , ''
    )
    ) AS ParamterList
    FROM
    sys.schemas AS S
    INNER JOIN
    sys.objects AS O
    ON O.schema_id = S.schema_id
    WHERE
    O.type IN ('FT','IF', 'TF');

    How you use this is up to you. I plan on hooking it into the Biml Query Table Builder to simulate tables for all my TVFs.

    Pop quiz - altering column types

    $
    0
    0

    Pop quiz

    Given the following DDL


    CREATE TABLE dbo.IntToTime
    (
    CREATE_TIME int
    );

    What will be the result of issuing the following command?

    ALTER TABLE dbo.IntToTime ALTER COLUMN CREATE_TIME time NULL;

    Clearly, if I'm asking, it's not what you might expect. How can an empty table not allow you to change data types? Well it seems Time and datetime2 are special cases as they'll raise errors of the form

    Msg 206, Level 16, State 2, Line 47 Operand type clash: int is incompatible with time

    If you're in this situation and need to get the type converted, you'll need to make two hops, one to varchar and then to time.


    ALTER TABLE dbo.IntToTime ALTER COLUMN CREATE_TIME varchar(10) NULL;
    ALTER TABLE dbo.IntToTime ALTER COLUMN CREATE_TIME time NULL;

    Altering table types, part 2

    $
    0
    0

    Altering table types - a compatibility guide

    In yesterday's post, I altered a table type. Pray I don't alter them further. What else is incompatible with an integer column? It's just a morbid curiosity at this point as I don't recall having ever seen this after working with SQL Server for 18 years. Side note, dang I'm old

    How best to answer the question, by interrogating the sys.types table and throwing operations against the wall to see what does/doesn't stick.


    DECLARE
    @Results table
    (
    TypeName sysname, Failed bit, ErrorMessage nvarchar(4000)
    );

    DECLARE
    @DoOver nvarchar(4000) = N'DROP TABLE IF EXISTS dbo.IntToTime;
    CREATE TABLE dbo.IntToTime (CREATE_TIME int);'
    , @alter nvarchar(4000) = N'ALTER TABLE dbo.IntToTime ALTER COLUMN CREATE_TIME @type'
    , @query nvarchar(4000) = NULL
    , @typeName sysname = 'datetime';

    DECLARE
    CSR CURSOR
    FORWARD_ONLY
    FOR
    SELECT
    T.name
    FROM
    sys.types AS T
    WHERE
    T.is_user_defined = 0

    OPEN CSR;
    FETCH NEXT FROM CSR INTO @typeName
    WHILE @@FETCH_STATUS = 0
    BEGIN
    BEGIN TRY
    EXECUTE sys.sp_executesql @DoOver, N'';
    SELECT @query = REPLACE(@alter, N'@type', @typeName);
    EXECUTE sys.sp_executesql @query, N'';

    INSERT INTO
    @Results
    (
    TypeName
    , Failed
    , ErrorMessage
    )
    SELECT @typeName, CAST(0 AS bit), ERROR_MESSAGE();
    END TRY
    BEGIN CATCH
    INSERT INTO
    @Results
    (
    TypeName
    , Failed
    , ErrorMessage
    )
    SELECT @typeName, CAST(1 AS bit), ERROR_MESSAGE()
    END CATCH
    FETCH NEXT FROM CSR INTO @typeName
    END
    CLOSE CSR;
    DEALLOCATE CSR;

    SELECT
    *
    FROM
    @Results AS R
    ORDER BY
    2,1;
    TypeNameFailedErrorMessage
    bigint0
    binary0
    bit0
    char0
    datetime0
    decimal0
    float0
    int0
    money0
    nchar0
    numeric0
    nvarchar0
    real0
    smalldatetime0
    smallint0
    smallmoney0
    sql_variant0
    sysname0
    tinyint0
    varbinary0
    varchar0
    date1Operand type clash: int is incompatible with date
    datetime21Operand type clash: int is incompatible with datetime2
    datetimeoffset1Operand type clash: int is incompatible with datetimeoffset
    geography1Operand type clash: int is incompatible with geography
    geometry1Operand type clash: int is incompatible with geometry
    hierarchyid1Operand type clash: int is incompatible with hierarchyid
    image1Operand type clash: int is incompatible with image
    ntext1Operand type clash: int is incompatible with ntext
    text1Operand type clash: int is incompatible with text
    time1Operand type clash: int is incompatible with time
    timestamp1Cannot alter column 'CREATE_TIME' to be data type timestamp.
    uniqueidentifier1Operand type clash: int is incompatible with uniqueidentifier
    xml1Operand type clash: int is incompatible with xml

    Pop Quiz - REPLACE in SQL Server

    $
    0
    0

    It's amazing the things I've run into with SQL Server this week that I never noticed. In today's pop quiz, let's look at REPLACE


    DECLARE
    @Repro table
    (
    SourceColumn varchar(30)
    );

    INSERT INTO
    @Repro
    (
    SourceColumn
    )
    SELECT
    D.SourceColumn
    FROM
    (
    VALUES
    ('None')
    , ('ABC')
    , ('BCD')
    , ('DEF')
    )D(SourceColumn);

    SELECT
    R.SourceColumn
    , REPLACE(R.SourceColumn, 'None', NULL) AS wat
    FROM
    @Repro AS R;

    In the preceding example, I load 4 rows into a table and call the REPLACE function on it. Why? Because some numbskull front end developer entered None instead of a NULL for a non-existent value. No problem, I will simply replace all None with NULL. So, what's the value of the wat column?

    Well, if you're one of those people who reads instruction manuals before attempting anything, you'd have seen Returns NULL if any one of the arguments is NULL. Otherwise, you're like me thinking "maybe I put the arguments in the wrong order". Nope, , REPLACE(R.SourceColumn, 'None', '') AS EmptyString that works. So what the heck? Guess I'll actually read the manual... No, this work, I can just use NULLIF to make the empty strings into a NULL , NULLIF(REPLACE(R.SourceColumn, 'None', ''), '') AS EmptyStringToNull

    Much better, replace all my instances of None with an empty string and then convert anything that is empty string to null. Wait, what? You know what would be better? Skipping the replace call altogether.


    SELECT
    R.SourceColumn
    , NULLIF(R.SourceColumn, 'None') AS MuchBetter
    FROM
    @Repro AS R;

    Moral of the story and/or quiz: once you have a working solution, rubber duck out your approach to see if there's an opportunity for improvement (only after having committed the working version to source control).

    Python pandas repeating character tester

    $
    0
    0

    Python pandas repeating character tester

    At one of our clients, we are data profiling. They have a mainframe, it's been running for so long, they no longer have SMEs for their data. We've been able to leverage Service Broker to provide a real-time, under 3 seconds, remote file store for their data. It's pretty cool but now they are trying to do something with the data so we need to understand what the data looks like. We're using a mix of TSQL and python to understand nullability, value variances, etc. One of the "interesting" things we've discovered is that they loved placeholder values. Everyone knows a date of 88888888 is a placeholder for the actual date which they'll get two steps later in the workflow. Except sometimes we use 99999999 because the eights are the placeholder for the time.

    Initially, we were just searching for one sentinel value, then two values until we saw the bigger pattern of "repeated values probably mean something." For us, this matters because we then need to discard those rows for data type suitability. 88888888 isn't a valid date so our logic might determine that column is best served by a numeric data type. Unless we exclude the eights value in which we get a 100% match rate on the column's ability to be converted to a date.

    How can we determine if a string is nothing but repeated values in python? There's a very clever test from StackOverflow

    source == source[0] * len(source)I would read that as "is the source variable exactly equal to the the first character of source repeated for the length of source?"

    And that was good, until we hit a NULL (None in python-speak). We then took advantage of the ternary equivalent in python to make it

    (source == source[0] * len(source)) if source else False

    Enter Pandas (series)

    Truth is a funny thing in an Pandas Series. Really, it is. The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. We were trying to apply the above function as we were doing everything else

    df.MyColumn.str.len()
    # this will fail magnificantly
    (df.MyColumn == df.MyColumn[0] * len(df.MyColumn)) if df.MyColumn else False

    It took me a while since I hadn't really used the pandas library beyond running what my coworker had done. What I needed to do, was get a row context to apply the calculations for true/false. As it stands, the Series stuff wants to try and aggregate the booleans or something like that. And it makes sense from a SQL perspective, you can't really apply aggregates to bit fields (beyond COUNT).

    So, what's the solution? As always, you're likely to say the exact thing you're looking for. In this case, apply was the keyword.

    df.MyColumn.apply(lambda source: (source == source[0] * len(source)) if source else False)

    Full code you can play with would be


    import pandas
    import pprint

    def isRepeated(src):
    return (src == src[0] * len(src)) if src else False

    df = pandas.DataFrame({"MyCol":pandas.Series(['AB', 'BC', 'BB', None])})

    pprint.pprint(df)

    print()
    # What rows have the same character in all of them?

    pprint.pprint(df.MyCol.apply(lambda source:(source == source[0] * len(source)) if source else False))
    #If you'd like to avoid the anonymous function...
    pprint.pprint(df.MyCol.apply(isRepeated))

    In short, python is glorious and I'm happy to writing in it again ;)


    2018 MVP Summit retrospective

    $
    0
    0

    2018 MVP Summit retrospective

    Another year of the MVP Summit is in the bag and as always, I have months worth of learning I'm excited to do.

    Thank you

    I'd like to extend a hearty thank you to Microsoft and the various teams for hosting us. I can't imagine the sheer amount of hours spent in preparation, actual time not-spent-working-on-technology-X, much less the expense of caffeinating, feeding, lodging, and transporting us.

    What I'm excited about

    Stream Analytics

    We have a high performance (60M messages per day averaging 130ms throughput) messaging system that allows us to expose mainframe data as a SQL Server database for analytics. The devil with Service Broker is that there's no built in monitoring. We have a clever dashboard built on the PowerBI reporting streaming dataset source that provides an at-a-glance health check for data processing. What we need though, is something that can drive action based on changes. The September changes in Stream Analytics look like the perfect fit. It allows us to detect not just hard limits (we've violated our 3 second SLA) but the squishier metrics like a background process just woke up and swamped us with a million rows in the past three minutes or our processing time is trending upwards and someone needs to figure out why.

    SQL Graph improvements

    While we are not yet using graph features, I can see opportunities for it with our client that I want to build some proof of concept models.

    Cosmos DB

    Alongside the Stream Analytics improvements, perhaps we need to feed the change data into Cosmos and then leverage the Change Feed support to push to analytics processing. And just generally, I need to invest some time in Apache Spark. I also learned that I don't need to discover all the patterns for lambda architecture as it's already out there with a handy URL to boot.

    Cognitive Services

    Ok, while picking up information about this was just to scratch a very silly itch, I was impressed how easy it was from the web interface. I have bird feeders and even though most seed will state that squirrels are not interested in it, that's a downright lie.

    Don't mind me, I'm just a fuzzy bird

    I want a camera pointed at my bird feeder and if a squirrel shows, I want to know about it. I used about a dozen pictures of my bird feeders with and without my nemesis to train the model and then fed back assorted photos to see how smart it was. Except for an image of a squirrel hiding in shadow, it was able to give me high confidence readings on what was featured in the photo. Here we can see that my dog is neither a bird nor a squirrel.
    Not a squirrel, just a lazy dog

    I'm so excited to get these bots built out. One for the Raspberry Pi to detect presence at the feeder and then an Azure based recognizer for friend versus foe. Once that's done, the next phase will be to identify specific bird species. And then tie it to feed type and feeder style (tray/platform versus house versus tube) and time of day and ... yes, lot of fun permutations that are easily available without having to learn all the computer vision and statistics. Feel free to give it a whirl at https://customvision.ai

    SQLOps studio

    This is the new cross platform SQL Server Management Studio replacement - sort of. It's not designed to do everything SSMS does but instead the vision is to solve the most needed problems and with the open source model, the community can patch in their own solutions. I'm excited to put together a better reporting interface for the SSISDB. Something that you can actually copy text out of - how crazy is that?

    Azure Data Lake Analytics

    It had been a year since I had worked through some of the ADLA/USQL so it was good to get back into the language and environment. I need to get on a project that is actually using the technology though to really cement my knowledge.

    What I learned

    In October of 2016, I launched Sterling Data Consulting as my company. I sub under a good friend and it's been an adventure running a business but I don't feel like I'm really running a business since I have no other business. One of my TODOs at the conference was to talk to other small shop owners to see if I could discover their "secret sauce." While I got assorted feedback, the two I want to send a special thank you to are John Sterrett of Procure SQL and Tim Radney. Their advice ranged from straight forward "I don't know what you do", "are you for hire" to thoughts on lead acquisition and my lack of vision for sales.

    Tim was also my roommate and it was great just getting to know him. We traded Boy Scout leader stories and he had excellent ideas for High Adventure fundraisers since that's something our troop is looking to do next year. For being a year younger than me, he sure had a lot more wisdom on the things I don't do or don't do well. You should check him at at the Atlanta SQL Saturday and attend his precon on Common SQL Server mistakes and how to avoid them.

    Photos

    Bellevue is less scenic than Seattle but the sunshine and warmth on Tuesday made for some nice photos of the treehouses. Yes, the Microsoft Campus has adult sized treehouses in it. How cool is that?

    Sort SQL Server tables into similarly sized buckets

    $
    0
    0

    Sort SQL Server Tables into similarly sized buckets

    You need to do something to all of the tables in SQL Server. That something can be anything: reindex/reorg, export the data, perform some other maintenance---it really doesn't matter. What does matter is that you'd like to get it done sooner rather than later. If time is no consideration, then you'd likely just do one table at a time until you've done them all. Sometimes, a maximum degree of parallelization of one is less than ideal. You're paying for more than one processor core, you might as well use it. The devil in splitting a workload out can be ensuring the tasks are well balanced. When I'm staging data in SSIS, I often use a row count as an approximation for a time cost. It's not perfect - a million row table 430 columns wide might actually take longer than the 250 million row key-value table.

    A sincere tip of the hat to Daniel Hutmacher (b|t)for his answer on this StackExchange post. He has some great logic for sorting tables into approximately equally sized bins and it performs reasonably well.


    SET NOCOUNT ON;
    DECLARE
    @bucketCount tinyint = 6;

    IF OBJECT_ID('tempdb..#work') ISNOTNULL
    BEGIN
    DROPTABLE #work;
    END

    CREATETABLE #work (
    _row intIDENTITY(1, 1) NOTNULL,
    [SchemaName] sysname,
    [TableName] sysname,
    [RowsCounted] bigintNOTNULL,
    GroupNumber intNOTNULL,
    moved tinyint NOTNULL,
    PRIMARYKEYCLUSTERED ([RowsCounted], _row)
    );

    WITH cte AS (
    SELECT B.RowsCounted
    , B.SchemaName
    , B.TableName
    FROM
    (
    SELECT
    s.[Name] as [SchemaName]
    , t.[name] as [TableName]
    , SUM(p.rows) as [RowsCounted]
    FROM
    sys.schemas s
    LEFTOUTERJOIN
    sys.tables t
    ON s.schema_id = t.schema_id
    LEFTOUTERJOIN
    sys.partitions p
    ON t.object_id = p.object_id
    LEFTOUTERJOIN
    sys.allocation_units a
    ON p.partition_id = a.container_id
    WHERE
    p.index_id IN (0,1)
    AND p.rowsISNOTNULL
    AND a.type = 1
    GROUPBY
    s.[Name]
    , t.[name]
    ) B
    )

    INSERTINTO #work ([RowsCounted], SchemaName, TableName, GroupNumber, moved)
    SELECT [RowsCounted], SchemaName, TableName, ROW_NUMBER() OVER (ORDERBY [RowsCounted]) % @bucketCount AS GroupNumber, 0
    FROM cte;


    WHILE (@@ROWCOUNT!=0)
    WITH cte AS
    (
    SELECT
    *
    , SUM(RowsCounted) OVER (PARTITION BY GroupNumber) - SUM(RowsCounted) OVER (PARTITION BY (SELECTNULL)) / @bucketCount AS _GroupNumberoffset
    FROM
    #work
    )
    UPDATE
    w
    SET
    w.GroupNumber = (CASE w._row
    WHEN x._pos_row THEN x._neg_GroupNumber
    ELSE x._pos_GroupNumber
    END
    )
    , w.moved = w.moved + 1
    FROM
    #workAS w
    INNERJOIN
    (
    SELECTTOP 1
    pos._row AS _pos_row
    , pos.GroupNumber AS _pos_GroupNumber
    , neg._row AS _neg_row
    , neg.GroupNumber AS _neg_GroupNumber
    FROM
    cte AS pos
    INNERJOIN
    cte AS neg
    ON pos._GroupNumberoffset > 0
    AND neg._GroupNumberoffset < 0
    AND
    --- To prevent infinite recursion:
    pos.moved < @bucketCount
    AND neg.moved < @bucketCount
    WHERE--- must improve positive side's offset:
    ABS(pos._GroupNumberoffset - pos.RowsCounted + neg.RowsCounted) <= pos._GroupNumberoffset
    AND
    --- must improve negative side's offset:
    ABS(neg._GroupNumberoffset - neg.RowsCounted + pos.RowsCounted) <= ABS(neg._GroupNumberoffset)
    --- Largest changes first:
    ORDERBY
    ABS(pos.RowsCounted - neg.RowsCounted) DESC
    ) AS x
    ON w._row IN
    (
    x._pos_row
    , x._neg_row
    );

    Now what? Let's look at the results. Run this against AdventureWorks and AdventureWorksDW


    SELECT
    W.GroupNumber
    , COUNT_BIG(1) AS TotalTables
    , SUM(W.RowsCounted) AS GroupTotalRows
    FROM
    #workAS W
    GROUP BY
    W.GroupNumber
    ORDER BY
    W.GroupNumber;


    SELECT
    W.GroupNumber
    , W.SchemaName
    , W.TableName
    , W.RowsCounted
    , COUNT_BIG(1) OVER (PARTITION BY W.GroupNumber ORDER BY (SELECT NULL)) AS TotalTables
    , SUM(W.RowsCounted) OVER (PARTITION BY W.GroupNumber ORDER BY (SELECT NULL)) AS GroupTotalRows
    FROM
    #workAS W
    ORDER BY
    W.GroupNumber;

    For AdventureWorks (2014), I get a nice distribution across my 6 groups. 12 to 13 tables in each bucket and a total row count between 125777 and 128003. That's less than 2% variance between the high and low - I'll take it.

    If you rerun for AdventureWorksDW, it's a little more interesting. Our 6 groups are again filled with 5 to 6 tables but this time, group 1 is heavily skewed by the fact that FactProductInventory accounts for 73% of all the rows in the entire database. The other 5 tables in the group are the five smallest tables in the database.

    I then ran this against our data warehouse-like environment. We had a 1206 tables in there for 3283983766 rows (3.2 million). The query went from instantaneous to about 15 minutes but now I've got a starting point for bucketing my tables into similarly sized groups.

    What do you think? How do you plan to use this? Do you have a different approach for figuring this out? I looked at R but without knowing what this activity is called, I couldn't find a function to perform the calculations.

    A date dimension for SQL Server

    $
    0
    0

    A date dimension for SQL Server

    The most common table you will find in a data warehouse will be the date dimension. There is no "right" implementation beyond what the customer needs to solve their business problem. I'm posting a date dimension for SQL Server that I generally find useful as a starting point in the hopes that I quit losing it. Perhaps you'll find it useful or can use the approach to build one more tailored to your environment.

    As the comments indicate, this will create: a DW schema, a table named DimDate and then populate the date dimension from 1900-01-01 to 2079-06-06 endpoints inclusive. I also patch in 9999-12-31 as a well known "unknown" date value. Sure, it's odd to have an incomplete year - this is your opportunity to tune the supplied code ;)


    -- At the conclusion of this script, there will be
    -- A schema named DW
    -- A table named DW.DimDate
    -- DW.DimDate will be populated with all the days between 1900-01-01 and 2079-06-06 (inclusive)
    -- and the sentinel date of 9999-12-31

    IFNOTEXISTS
    (
    SELECT * FROM sys.schemas AS S WHERE S.name = 'DW'
    )
    BEGIN
    EXECUTE('CREATE SCHEMA DW AUTHORIZATION dbo;');
    END
    GO
    IFNOTEXISTS
    (
    SELECT * FROM sys.schemas AS S INNERJOIN sys.tables AS T ON T.schema_id = S.schema_id
    WHERE S.name = 'DW'AND T.name = 'DimDate'
    )
    BEGIN
    CREATETABLE DW.DimDate
    (
    DateSK intNOTNULL
    , FullDate dateNOTNULL
    , CalendarYear intNOTNULL
    , CalendarYearText char(4) NOTNULL
    , CalendarMonth intNOTNULL
    , CalendarMonthText varchar(12) NOTNULL
    , CalendarDay intNOTNULL
    , CalendarDayText char(2) NOTNULL
    , CONSTRAINT PK_DW_DimDate
    PRIMARYKEYCLUSTERED
    (
    DateSK ASC
    )
    WITH (ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE)
    , CONSTRAINT UQ_DW_DimDate UNIQUE (FullDate)
    );
    END
    GO
    WITH
    -- Define the start and the terminal value
    BOOKENDS(FirstDate, LastDate) AS (SELECT DATEFROMPARTS(1900,1,1), DATEFROMPARTS(9999,12,31))
    -- itzik ben gan rapid number generator
    -- Builds 65537 rows. Need more - follow the pattern
    -- Need fewer rows, add a top below
    , T0 AS
    (
    -- 2
    SELECT 1 AS n
    UNIONALLSELECT 1
    )
    , T1 AS
    (
    -- 2^2 => 4
    SELECT 1 AS n
    FROM
    T0
    CROSS APPLY T0 AS TX
    )
    , T2 AS
    (
    -- 4^4 => 16
    SELECT 1 AS n
    FROM
    T1
    CROSS APPLY T1 AS TX
    )
    , T3 AS
    (
    -- 16^16 => 256
    SELECT 1 AS n
    FROM
    T2
    CROSS APPLY T2 AS TX
    )
    , T4 AS
    (
    -- 256^256 => 65536
    -- or approx 179 years
    SELECT 1 AS n
    FROM
    T3
    CROSS APPLY T3 AS TX
    )
    , T5 AS
    (
    -- 65536^65536 => basically infinity
    SELECT 1 AS n
    FROM
    T4
    CROSS APPLY T4 AS TX
    )
    -- Assume we now have enough numbers for our purpose
    , NUMBERS AS
    (
    -- Add a SELECT TOP (N) here if you need fewer rows
    SELECT
    CAST(ROW_NUMBER() OVER (ORDERBY (SELECTNULL)) ASint) -1 AS number
    FROM
    T4
    UNION
    -- Build End of time date
    -- Get an N value of 2958463 for
    -- 9999-12-31 assuming start date of 1900-01-01
    SELECT
    ABS(DATEDIFF(DAY, BE.LastDate, BE.FirstDate))
    FROM
    BOOKENDS AS BE
    )
    , DATES AS
    (
    SELECT
    PARTS.DateSk
    , FD.FullDate
    , PARTS.CalendarYear
    , PARTS.CalendarYearText
    , PARTS.CalendarMonth
    , PARTS.CalendarMonthText
    , PARTS.CalendarDay
    , PARTS.CalendarDayText
    FROM
    NUMBERS AS N
    CROSS APPLY
    (
    SELECT
    DATEADD(DAY, N.number, BE.FirstDate) AS FullDate
    FROM
    BOOKENDS AS BE
    )FD
    CROSS APPLY
    (
    SELECT
    CAST(CONVERT(char(8), FD.FullDate, 112) ASint) AS DateSk
    , DATEPART(YEAR, FD.FullDate) AS [CalendarYear]
    , DATENAME(YEAR, FD.FullDate) AS [CalendarYearText]
    , DATEPART(MONTH, FD.FullDate) AS [CalendarMonth]
    , DATENAME(MONTH, FD.FullDate) AS [CalendarMonthText]
    , DATEPART(DAY, FD.FullDate) AS [CalendarDay]
    , DATENAME(DAY, FD.FullDate) AS [CalendarDayText]

    )PARTS
    )
    INSERTINTO
    DW.DimDate
    (
    DateSK
    , FullDate
    , CalendarYear
    , CalendarYearText
    , CalendarMonth
    , CalendarMonthText
    , CalendarDay
    , CalendarDayText
    )
    SELECT
    D.DateSk
    , D.FullDate
    , D.CalendarYear
    , D.CalendarYearText
    , D.CalendarMonth
    , D.CalendarMonthText
    , D.CalendarDay
    , D.CalendarDayText
    FROM
    DATES AS D
    WHERENOTEXISTS
    (
    SELECT * FROM DW.DimDate AS DD
    WHERE DD.DateSK = D.DateSk
    );
    Viewing all 144 articles
    Browse latest View live