HomeLandscapeAbout me

Making Data Scientists Productive in Azure

By Valdas Maksimavicius
Published in Data Architecture
March 21, 2021
8 min read
Making Data Scientists Productive in Azure

Doing data science today is far more difficult than it will be in the next 5-10 years. Sharing, collaborating on workflows in painful, pushing models into production is challenging. Let’s explore what Azure provides to ease Data Scientists’ pains.

This article is a part of my Azure Data Platform Landscape overview project.

43cdf379ca41169897c02ae8841f94e9b6cbfcdf

In this post, you will learn about Azure Machine Learning Studio, Azure Machine Learning, Azure Databricks, Data Science Virtual Machine, and Cognitive Services. What tools and services can we choose based on a problem definition, skillset or infrastructure requirements?

“One thing about Microsoft - they have many ways to solve the same problem”

Picking a good name for your classes, methods or variables is essential (and difficult). Finding a good name for a product or service seems to be even more challenging. When I look at the Azure service names (“machine learning” this, “machine learning” that), it is clear that even big companies, like Microsoft, have difficulties finding catchy and straightforward names.

As a result, there are many different services with similar names. For example, what is the difference between Azure Machine Learning Service and Azure Machine Learning Studio? Is Microsoft Machine Learning Server the same thing as Data Science Virtual Machine? Let’s find out!

Agenda:

  • Azure Cognitive Services
  • ML.NET
  • Azure Machine Learning Studio
  • Power BI Auto ML
  • Azure Machine Learning
  • Azure Synapse Analytics
  • Azure Databricks
  • Data Science Virtual Machine
  • Microsoft Machine Learning Server
  • SQL Server Machine Learning Services

What do I mean by saying “Making Data Scientists Productive in Azure”?

607c27252c52189f2dc991db568bbb8cffe8546e

Matei Zaharia, the author of Spark, in one of his presentations pointed out the main aspects of the machine learning lifecycle.

In the underlying machine learning lifecycle, we start with data. Later, we run data preparation scripts, model training, and model deployment. Then, if our application is doing anything important, we want to monitor it to see how it’s doing, collect extra data and feed it back into this process again. Each step has many tools that often need tuning for better results and performance. Finding what parameters were used at each stage to get a specific result is essential to be able to experiment with all. Everything needs to happen at scale.

By “productive” in the title of this post, I mean a collection of well-integrated tools that support the whole machine learning lifecycle.

Azure Cognitive Services

What is it?

Azure services with pre-built AI and ML models

What can you do with it?

Add intelligent features to your apps

Azure Cognitive Services is a powerful capability that allows software developers (no machine learning knowledge required) use state of the art machine learning models and integrate with other applications by calling APIs or importing SDKs.

2d3f085e2bcbf9e6413287948de5beca91642c38

Azure Cognitive Services lets to build apps with powerful algorithms using a few lines of code, run across devices and platforms such as iOS, Android, and Windows. Cognitive Services continually expands with new features. Many services offer free demos.

ecb7a78dcf750b672ada251dbd05dc888a8c6fcd

For example, here is a face detection API returning my face parameters. You find attributes like hair color, smile, gender. But the first property is BALD: 0.17! By the way, increased by 4 percentage points since the last year :)

Azure Cognitive Services - Summary

Key benefits:

  • Minimal development effort
  • Easy integration via HTTP REST
  • Built-in integrations with other Azure services
  • Containers support
  • Azure Virtual Network for enhanced data security

Considerations:

  • Limited customization allowed
  • Limited support for less popular languages, e.g. Lithuanian, Estonian, etc.

ML NET

What is it?

An open-source and cross-platform ML framework

What can you do with it?

Create custom ML models using C# or F# without leaving the .NET ecosystem

420e6e67c3748a74a0e79d39fb182f89456281bf

ML.NET - Summary

Key benefits:

  • High performance
  • AutoML functionality
  • Leverage TensorFlow or ONNX
  • Expose a model via an ASP.NET Core Web API
  • Integrate with Spark via .NET for Apache Spark (preview)
  • Use ML.NET in Jupyter Notebooks (preview)

Considerations:

  • Limited support for popular ML libraries (e.g. Scikit-learn, NumPy)

Azure Machine Learning Studio

What is it?

Drag-and-drop visual interface for ML

What can you do with it?

Build, experiment and deploy models using pre-configured algorithms

Azure Machine Learning Studio (ML Studio) is a collaborative, drag-and-drop visual workspace where you can build, test, and deploy machine learning solutions without needing to write code. It uses pre-built and pre-configured machine learning algorithms and data-handling modules. Business analysts/statisticians without R/Python knowledge would be productive with this tool.

Azure Machine Learning Studio is an impressive service, that can make people productive quickly.|

Use ML Studio when you want to experiment with machine learning models quickly and easily, and the built-in machine learning algorithms are enough for your solutions.

The whole experiment looks like a graph, with inputs at the top and outputs (predictions) at the bottom. In the example above, “Binary Classification: Direct marketing”, I compare two algorithms (two-class boosted decision tree and two-class support vector machine), and the tool makes it easy to deploy a better performing model as a web service.

54e3a728c1d44ece76f43af3de8c8cbb9456db6a

Azure Machine Learning Studio - Summary

Key benefits:

  • Interactive visual interface
  • Built-in Jupyter Notebooks for data exploration
  • Direct deployment of trained models as web services
  • Built-in integrations with other Azure services

Considerations:

  • Online only
  • Limited number of supported input and output connectors
  • Limited support for custom Python/R code

Power BI Auto ML

What is it?

Auto Machine Learning component built into Power BI to build ML models without any code

What can you do with it?

Using AutoML in Power BI, business analysts without a strong background in machine learning can build ML models.

b7446654fcee13fad12f3d5a717c60a4fc86f73b

bbdf59bf74ea0510722483cf9eba31a54708154a

Power BI Auto Machine Learning - Summary

Key benefits:

  • Use Power BI dataflows to load data, transform it and build models on top of it
  • Deploy models as services via Azure ML
  • Get top predictors during training and explanations for each prediction

Considerations:

  • Limited selection of algorithms (binary prediction, general classification, regression)
  • Paid Pro or Premium license needed

Azure Machine Learning

What is it?

Managed cloud service for ML

What can you do with it?

Train, deploy and manage models in Azure

First of all, be aware that we discuss now Azure Machine Learning, NOT STUDIO (presented earlier).

Azure Machine Learning (Azure ML) provides a cloud-based environment you can use to develop, train, test, deploy, manage, and track machine learning models. It supports open-source technologies so that you can use Python open-source packages with machine learning components.

00c85a7921f4d46fc64f53c64802478390ff476d

By using Azure ML, you can start training on your local machine and then scale out to the cloud. With many available compute targets, and with advanced hyperparameter tuning services, you can build better models faster by using the power of the cloud.

f15fc5846c04a410b1978be7b517fdfb02edb7bc

Azure ML supports the whole cycle, from data ingestion to deployment using Docker containers. Data should be available in Azure Blob Storage. For data preparation and training you can use any Python open-source package. For deployment, the easiest setup is achievable with Azure Container Instances or Azure Kubernetes Service.

a8179a8e57e7298e5153fe47b4cac54c7b48862a

Azure Machine Learning - Summary

Key benefits:

  • Native integration with Azure Synapse Analytics
  • Central management of scripts and run history
  • Run model training scripts locally (offline), and then scale out to the cloud
  • Management and deployment of models to the cloud or edge devices
  • Integration with Azure DevOps

Considerations:

  • Investigate MLflow to track metrics and manage models

Azure Synapse Analytics

What is it?

A limitless analytics service that brings together enterprise data warehousing and Big Data analytics

What can you do with it?

Query query, build pipelines, develop reports and dashboards, use notebooks, and create machine learning models.

At a high level, Azure Synapse Analytics helps with:

  • (Business understanding)
  • Data acquisition and understanding
  • Modeling
  • Model deployment and scoring

You can enrich your data in Spark tables with new machine learning models that you train by using automated machine learning. In Azure Synapse Analytics, you can select a Spark table in the workspace to use as a training dataset for building machine learning models, and you can do this in a code-free experience

3bfe6288fa6f82cbd3a157843903b798df1dcb20

3ccebc0682cde22401d4578feb2a481b6154c0df

d4590cc60c5103e72c647e64cf5475218b88b663

Another option allows you to enrich your data in Azure Synapse Analytics with Azure Cognitive Services

da590a27198ad8674b5c1547c78d824a969ad4d3

86a9cfdd1e2e371d7a6290d85ced64ae5d6c9d2c

Azure Synapse Analytics - Summary

Key benefits:

  • Unified platform for various personas and different workloads
  • Native integration with Azure Machine Learning
  • Native integration with Azure Cognitive Services

Considerations:

  • Limited Python and Spark capabilities compared to Azure Databricks

Azure Databricks

What is it?

Spark-based analytics platform

What can you do with it?

Build and deploy models and data workflows

bccc8e4fcbc2e0b8bbc17296e23436841b8d6420

Databricks provides a managed cloud platform built around Spark that delivers 1) fully managed Spark clusters, 2) an interactive workspace for exploration and visualization, 3) a production pipeline scheduler, and 4) a platform for powering your Spark-based applications

5eeef36070ed9c215d4a20456d30d87db47f39df

The main concepts:

  • Databricks Runtime (Apache Spark, concurrent clusters, REST APIs, libraries)
  • Collaborative workspace (notebooks, user access, git integration)
  • Deploy Jobs & Workflows (job scheduler, notifications & logs, multi-stage pipelines)
  • Security (single sign-on (SSO), access control list (ACL), secrets via Azure Key Vault)
  • DeltaLake for data storage metamodel
  • MLFlow for ML model management

1d9b961f6e8e3be5180277ed307d067ac517d9db

Azure Databricks, with the help of extra libraries and services, supports the complete machine learning cycle.

Azure Databricks - Summary

Key benefits:

  • The most mature development environment for ML on the Azure platform
  • Seamless integration with MLflow & Azure ML
  • Integrated with other Azure services (e.g., Azure Data Factory, Azure Key Vault)
  • Delta Lake support

Considerations:

  • Online only
  • Cost includes the price of virtual machines and Databricks fee

Data Science Virtual Machine

What is it?

An Azure virtual machine with pre-installed data science tools

What can you do with it?

Develop ML solutions in a pre-configured environment

Data Science Virtual Machine (DSVM) is a pre-installed and pre-configured set of images for Windows or Linux virtual machines. DSVM includes the most popular data science tools. Since it has access to the full potential of Azure networking and scalability, DSVM can be a great environment even for data science teams.

e65c89513a06ce843129ff8fbeb4ce61ada53750

Data Science Virtual Machine can be useful for learning and comparing different machine learning tools.

Data Science Virtual Machine - Summary

Key benefits:

  • The most complete development environment for ML on the Azure platform
  • Reduced time to install, manage, and troubleshoot data science tools and frameworks
  • Included the latest versions of all commonly used tools and frameworks
  • Virtual machine options include scalable GPU images

Considerations:

  • Online only
  • Infrastructure as a service (IaaS), not a managed data science solution

ab141abdb275f798d7553effd35a42c299d28d2c

If you can’t use Azure, then I suggest you look into Microsoft Machine Learning Server and SQL Server Machine Learning Services.

Microsoft Machine Learning Server

What is it?

Cross-platform standalone server for predictive analysis

What can you do with it?

Build and deploy models written in R or Python

In September 2017, Microsoft R Server was released under the new name of Microsoft Machine Learning Server (because of added Python support). Microsoft Machine Learning Server (ML Server) is a flexible choice for analyzing data at scale, building intelligent apps, and discovering insights. It includes a collection of R packages, Python packages, interpreters, and infrastructure for developing and deploying distributed R and Python-based machine learning solutions on a range of platforms across on-premises and cloud.

4503c4f0c0f64ec5f0d9aca6a012068f37c82b1e

ML Server offers best-in-class operationalization - from the time a machine learning model is completed, it takes just a few clicks to generate web services APIs. These web services are hosted on a server grid on-premises or in the cloud and can be integrated with line-of-business applications. Additionally, ML Server integrates seamlessly with Active Directory and Azure Active Directory and includes role-based access control to satisfy security and compliance needs of your enterprise.

982da76106686b8fb93072e2ca6269d122bf7eca

ML Server has full support for the data science lifecycle of R and Python-based analytics.

Microsoft Machine Learning Server - Summary

Key benefits:

  • Built on a legacy of Microsoft R Server and Revolution R Enterprise
  • Advanced security options
  • Deploy R and Python models as web services

Considerations:

  • You need to deploy and manage Machine Learning Server in your enterprise

SQL Server Machine Learning Services

What is it?

A built-in SQL Server feature to support machine learning

What can you do with it?

Execute Python and R scripts with relational data

SQL Server Machine Learning Services - Summary

Key benefits:

  • Run your scripts where the data resides and eliminate transfer of the data across the network to another server
  • Encapsulate predictive logic in a database function or as a library
  • Use base distributions of Python, R, and Java (extensibility framework)

Considerations:

  • Assumes a SQL Server database as the data tier for your application
  • Limited scalability
  • A long list of known issues

Summary

Overall, it seems that Azure Machine Learning service is expanding really fast and offers amazing Data Science capabilities. Azure Databricks provides amazing data engineering capabilities.


Tags

#azure#machine learning#data science

Share

Previous Article
Software Is About Developing Knowledge More Than Writing Code
Valdas Maksimavicius

Valdas Maksimavicius

Data & Analytics Leader

Topics

Data Architecture
Data Engineering
Data Governance
Miscellaneous

Related Posts

Launching Databricks at If Insurance | Medium
April 24, 2021
1 min

Quick Links

About mePrivacyContactLandscape

Social Media