Tuesday, 24 January 2017

HOW TO TRANSFER 3 MILLION OF ROWS OR 1.5GB OF DATA PER ONE MINUTE

Recently I had a huge task – improve ETL process between MS SQL servers – to transfer near billion of rows every day in a real time. This means, once a master SQL server receives new rows, the rows must be immediately transferred to 8 satellite MS SQL servers.
The back-end. Every MS SQL server running on one Intel Xeon E5-2690 2.9GHz 8 cores CPU, has 384GB of memory and enterprise SSD storage: separate RAID10 for data, index and log files. TempDB running on RAM drive (in memory). The main database table are separated in file-groups per day.
Before the improvement, I found around up to 800.000 rows per minute through-output. Since the amount of data were increasing every single day, the performance were too slow. By the way, the method to move the rows to satellite servers were to use stored procedure and to transfer the rows in chunks up to 1000000 of rows .
To improve the through-output, the decision were made to use SSIS package instead of USP, since SSIS is probably the best place for ETL processes. And of course to multi-thread the process. So after a lot of tests the best performance we reached using the data flow bellow:
SSIS package Data Flow
The most important part is Balance Data Distributor. Using the component, the SSIS package were able to write in 4 separate threads into one table. This helped a lot and every additional thread gave additional 500000 of rows per minute through-output (using single thread without the component, SSIS were able to reach up to 1000000 rows per minute). So the total through-output are up to 3 million of rows or 1.5GB of data per minute in the real time from the master MS SQL server to 8 satellite MS SQL servers.
Hope this will help someone to save a lot of time and provides ability to have really fast solution to move huge amount of data between servers, etc. prepared in a short time.
If you need any help on details, leave a comment or write me an email!

How do I insert multiple rows WITHOUT repeating the “INSERT INTO dbo.Blah” part of the statement?

First Option
Syntax works in SQL Server 2008 (but not in SQL Server 2005)
CREATE TABLE MyTable (id int, name char(10));

INSERT INTO MyTable (id, name) VALUES (1, 'Bob'), (2, 'Peter'), (3, 'Joe');

SELECT * FROM MyTable;

id |  name
---+---------
1  |  Bob       
2  |  Peter     
3  |  Joe   
Second Option
INSERT INTO dbo.MyTable (ID, Name) 
select * from
(
 select 123, 'Timmy'
  union all
 select 124, 'Jonny' 
  union all
 select 125, 'Sally'
 ...
) x    

SQL Server XML Querying

Problem
We have XML data in a SQL Server database that we want to search efficiently, but are not sure of the best way to get the results we want. We need to find specific values in certain XML nodes- this will involve finding node values that match given substrings.  Check out this tip to learn more.
Solution
We will be working with the AdventureWorks2012 sample database - querying the Demographics XML column in the Person.Person table. To see what the Demographics XML data looks like, let's pull a sample Demographics field by selecting a random record:
--ordering by NEWID() function randomizes the sort order
SELECT TOP 1 Demographics 
FROM AdventureWorks2012.Person.Person
ORDER BY NEWID() 
GO

If we expand the XML contents of this field, we see:
querying the Demographics XML column in the Person.Person table
It's evident that we are dealing with typed XML, since we can see a namespace declaration (xmlns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey") inside the root node. This means that the XML data is associated with an XML schema that enforces data type and validation rules. We want to get a count of the surveys where the individual's education is listed as ‘Bachelors'. How should we do this? We can try a rather primitive method that finds the existence of substrings anywhere in the XML column by using the CHARINDEX function:
--filtering XML data using CHARINDEX function
SELECT COUNT(1)
FROM Person.Person
WHERE CHARINDEX('Bachelors',CONVERT(VARCHAR(MAX),Demographics),1) > 0
Querying with XQuery
This does return a count of the number of times the word ‘Bachelors' appears. However, it's quite clumsy - and worse yet, doesn't guarantee that the text ‘Bachelors' is a real XML value in the table - it could also be part of a node name. A much cleaner solution is to use XQuery.

Querying SQL Server Data with XQuery

XQuery is a language designed for querying XML data, and is not proprietary to SQL Server - it is used by many relational database management systems (RDBMS). A simple implementation of an XQuery solution is as follows:
--filtering XML data using XQuery
;WITH XMLNAMESPACES 
('http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey' AS ns)
SELECT COUNT(1)
FROM Person.Person
WHERE Demographics.exist('(/ns:IndividualSurvey/ns:Education[.="Bachelors"])')=1
GO

This may look a little complicated, but really isn't too hard to decipher. The WITH XMLNAMESPACES statement is needed to explicitly declare the namespace for in order to reference any XML node element or attribute, which we did in the WHERE clause (the Education node element). Another way to declare the namespace is to put the declaration in-line with the actual XQuery method that is being used:
--filtering XML data using XQuery with in-line namespace declaration
 SELECT COUNT(1)
FROM Person.Person
WHERE Demographics.exist('declare namespace ns=
"http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey"; 
(/ns:IndividualSurvey/ns:Education[.="Bachelors"])')=1
GO


In both examples, we declared a namespace and aliased it as ‘ns', which is used as a prefix for each path step or node reference. We listed the node path steps from root node (IndividualSurvey) to destination node (Education).

Using the SQL Server XQuery exist() method

The exist() method is used in our query as the WHERE criterion to filter the XML data, finding XML records where the Education node's value is equal to the text ‘Bachelors'. The exist() method returns a bit value of 1 if the method finds a non-empty node whose element or attribute value matches the given criteria. Upon running the query, however, we get a record count of zero
What happened? We've assumed too much - that the value we are looking for in the Education node will exactly match the text ‘Bachelors'. If we look at the sample Demographics XML fragment closely, we see that there is a space after the word. So, if we modify our query and run it again, we get the following:
--filtering XML data using XQuery - adding a space to the string
;WITH XMLNAMESPACES 
('http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey' AS ns)
SELECT COUNT(1)
FROM Person.Person
WHERE Demographics.exist('(/ns:IndividualSurvey/ns:Education[.="Bachelors "])')=1
GO
Using the contains() function

Using the SQL Server XQuery contains() function

How can we prevent this oversight from happening again? XQuery in SQL Server makes use of a contains() function that finds a string as a whole substring in a node value. The contains() function requires two arguments:
    arg1 - String value to test  arg2 - Substring to look for
Using the contains() function, we don't have to be concerned with matching the string exactly:
--filtering XML data using the XQuery contains() function
;WITH XMLNAMESPACES 
('http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey' AS ns)
SELECT COUNT(1)
FROM Person.Person
WHERE Demographics.exist('(/ns:IndividualSurvey/ns:Education[contains(.,"Bachelors")])')=1
GO
XQuery in SQL Server makes use of a contains() function
We can easily expand our filter by additionally specifying that only records where the Gender node contains ‘F' are returned:
--filtering XML data using the XQuery contains() function in multiple WHERE criteria
;WITH XMLNAMESPACES 
('http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey' AS ns)
SELECT COUNT(1)
FROM Person.Person
WHERE Demographics.exist('(/ns:IndividualSurvey/ns:Education[contains(.,"Bachelors")])')=1
AND Demographics.exist('(/ns:IndividualSurvey/ns:Gender[contains(.,"F")])')=1
GO
Using the value() method
Our contains() function instances use what is called a self node axis abbreviation (‘.') as their first argument. This specifies that we want to search in the current node context (Gender or Education). The second argument is for the actual string we are searching for (‘Bachelors' or ‘F').

Using the SQL Server XQuery value() method

We assume that our query is returning the records that meet our criteria, but what if we want to see the actual values to verify this? Another XQuery method we can use for this is value(). The syntax for the method is as follows:
    value (XQuery, SQLType)
We'll employ this method to return node values for the Education and Gender nodes (only the first 3 records), along with the non-XML LastName field:
;WITH XMLNAMESPACES 
('http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey' AS ns)
SELECT TOP 3 
LastName,
Demographics.value('(/ns:IndividualSurvey/ns:Education)[1]', 'varchar(50)') AS Education,
Demographics.value('(/ns:IndividualSurvey/ns:Gender)[1]', 'char(1)') AS Gender
FROM Person.Person
WHERE Demographics.exist('(/ns:IndividualSurvey/ns:Education[contains(.,"Bachelors")])')=1
 AND Demographics.exist('(/ns:IndividualSurvey/ns:Gender[contains(.,"F")])')=1
GO
We've looked briefly at some basic ways to query XML data using XQuery
The value() method requires an XQuery path argument like the exist() method, but also needs a SQL Server data type argument. We are using ‘varchar(50)' and ‘char(1)' as the SQL Server data types for the Education and Gender node values.

Singleton Requirement

The value() method also enforces a singleton rule - requiring that the node element or attribute instance in a sequence be explicitly specified, even if there is only one. For example, there is only one Education node in our XML fragment, but we still have to designate a singleton by placing the ‘[1]' at the end of the value() method, thus indicating that we want the first Education node.

Summary

We've looked briefly at some basic ways to query XML data using XQuery. We introduced the exist() method as an effective XQuery method to use in the WHERE clause. We also experimented with the XQuery contains()function, using it to count XML records having a precise node string value match. Finally, we used a very common XQuery method - the value() method - to return XML values.