SQL Server 2017 for Analytics

Three weeks ago Microsoft announced a large pack of new features for its data platform. Extended R functionality and added Python, GPU-accelerated analytics, built-in graph database, Linux support. Currently, the latest version is CTP 2.0 (Community Technology Preview) and the anticipated release date is Mid-2017. Let’s examine what updates Microsoft prepared for analytics.

Main picture

1. Built-in graph extensions

This one was the biggest surprise to me. The new SQL Server offers [graph database capabilities] (https://docs.microsoft.com/en-us/sql/relational-databases/graphs/sql-graph-overview) to model many-to-many relationships. The graph relationships are integrated into T-SQL. Is it a Neo4j built in SQL Server? No. It is a Microsoft’s developed functionality built into the core SQL Server engine. There is a great blog post and a demo, while here you might find more details from Microsoft.

2. Improved support for R and added Python

R services, the feature name in SQL Server 2016, has been changed to [Machine Learning Services] (https://docs.microsoft.com/en-us/sql/advanced-analytics/r/r-services) to reflect support for the Python language. In CTP 2.0 version the service is available only on Windows, Linux support for Machine Learning Services should come later this year.

Python [Python tools] (https://docs.microsoft.com/en-us/sql/advanced-analytics/python/sql-server-python-services) can be installed by running the SQL Server setup wizard and selecting the right language. Currently, SQL Server 2017 CTP 2.0 includes a portion of the Anaconda distribution and Python 3.6. Because support for Python is a pre-release feature and still under development, it includes only a subset of the available R functionality. As Microsoft claims, future [additions to Python] (https://docs.microsoft.com/en-us/sql/advanced-analytics/python/python-interoperability) will include the Microsoft Cognitive Toolkit, a library that supports a variety of neural network models, including convolutional networks (CNN), recurrent networks (RNN), and Long Short-Term Memory networks (LSTM).

R tools The latest version of Microsoft R (version 9.1.0) is available. Also, SQL Server 2017 provides improved [package management capabilities] (https://docs.microsoft.com/en-us/sql/advanced-analytics/r/r-package-management-for-sql-server-r-services). This will improve the life of Data Scientists by simplifying package back-ups, new package installations. DBA’s will be happy as well because permissions management for R tools are much easier now. In CTP 2.0, MicrosoftML includes new image and test featurization functions, as well as support for parallelizable models with rxExecby.

3. GPU-accelerated analytics

SQL Server 2017 can also leverage NVIDIA GPU-accelerated computing through the Python/R interface. Developers can implement GPU-accelerated analytics and very sophisticated AI directly in the database server as stored procedures. According to documentation, Neural Network algorithm can utilize GPU acceleration and is accessible from R interface. For sure this will be expanded in the future, with new releases. Where do I get a server with GPU capabilities? Ex. [Azure VM N series] (https://azure.microsoft.com/en-us/pricing/details/virtual-machines/series/)

4. Linux support

SQL Server 2017 is available for Windows, Linux and Docker containers. It is a significant step towards making SQL Server an extremely powerful platform. In CTP 2.0, Linux version lags behind Windows. For now, it seems that only the database engine is available on Linux, while all other services, including machine learning, are still under development.

[Linux release notes] (https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-release-notes) BTW. There is no need to install it on Linux on your own. Azure offers a CTP 2.0 image on Red Hat.

Overview

In the past, a common application pattern was to create statistical and analytical models outside the database and deploy these models in custom-built production systems. That resulted in a lot of developer heavy lifting, and the development and deployment lifecycle can take months. Now, it seems Microsoft is confident to take analytics to completely another level.

I can’t wait to experiment with those services, see how it all integrates into a single architecture. Stay tuned!

software