Giter Club home page Giter Club logo

sqlserver-bdc's Introduction

Workshop: SQL Server Big Data Clusters

Contributor: Martin
Purpose: Demo/Workshop
Updated date: 2020/03/26

Welcome to this Microsoft solutions workshop on the architecture on SQL Server Big Data Clusters. You'll experiment with SQL Server Big Data Clusters (BDC), and how you can use it to implement large-scale data processing and machine learning.

This Workshop assumes you have a full understanding the concepts of big data analytics, the technologies (such as containers, Kubernetes, Spark and HDFS, machine learning, and other technologies) that you will use throughout the Workshop, the architecture of a BDC. If you are familiar with these topics, you can take a complete course here.

In this Workshop you'll learn how to create external tables over other data sources to unify your data, and how to use Spark to run big queries over your data in HDFS or do data preparation. You'll review a complete solution for an end-to-end scenario, with a focus on how to extrapolate what you have learned to create other solutions for your organization.

This Workshop expects that you understand data structures and working with SQL Server and computer networks. This Workshop does not expect you to have any prior data science knowledge, but a basic knowledge of statistics and data science is helpful in the Data Science sections. Knowledge of SQL Server, Azure Data and AI services, Python, and Jupyter Notebooks is recommended. AI techniques are implemented in Python packages. Solution templates are implemented using Azure services, development tools, and SDKs. You should have a basic understanding of working with the Microsoft Azure Platform.

You need to have all of the prerequisites completed before taking this Workshop.

You need a full Big Data Cluster for SQL Server up and running, and have identified the connection endpoints, with all security parameters. You find out how to do that here.

You will work through six Jupyter Notebooks using the Azure Data Studio tool. Download them and open them in Azure Data Studio, running only one cell at a time.

NotebookTopics
bdc-00-overview.ipynb Overview of the Workshop and setup of the source data, problem space, solution options and architectures
bdc-01-k8s.ipynb In-depth details of a pod or other Kubernetes artifacts that are located in a SQL Server big data cluster.
bdc-02-adstudio.ipynb View service endpoints and status of a SQL Server big data cluster components.
bdc-03-sqlserver-master.ipynb Run standard SQL Server Queries against the Master Instance (MI) in a SQL Server big data cluster.
bdc-04-data-virtualization.ipynb Learn how to create and query Virtualized Data in a SQL Server big data cluster.
bdc-05-data-mart.ipynb Create and query a Data Mart using Virtualized Data in a SQL Server big data cluster.
bdc-06-spark-etl.ipynb Learn how to work with Spark Jobs in a SQL Server big data cluster.
bdc-07-spark-ml.ipynb Train Spark ML model in a SQL Server big data cluster and export is as a MLeap bundle
bdc-08-model-deployment.ipynb Learn how to export and deploy MLeap bundle in a SQL Server big data cluster.

sqlserver-bdc's People

Contributors

7xuanlu avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.