Giter Club home page Giter Club logo

azure / data-product-streaming Goto Github PK

View Code? Open in Web Editor NEW
35.0 15.0 13.0 12.4 MB

Template to deploy a Data Product for data stream processing into a Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Product template can be used by cross-functional teams to ingest, provide and create new data assets within the platform.

License: MIT License

PowerShell 8.27% Dockerfile 0.63% Shell 22.40% Bicep 68.69%
arm azure architecture data-platform enterprise-scale policy-driven bicep data-mesh data-fabric enterprise-scale-analytics

data-product-streaming's Introduction

Cloud-scale Analytics Scenario - Data Product Streaming

Objective

The Cloud-scale Analytics Scenario provides a prescriptive data platform design coupled with Azure best practices and design principles. These principles serve as a compass for subsequent design decisions across critical technical domains. The architecture will continue to evolve alongside the Azure platform and is ultimately driven by the various design decisions that organizations must make to define their Azure data journey.

The Cloud-scale Analytics architecture consists of two core building blocks:

  1. Data Management Landing Zone which provides all data management and data governance capabilities for the data platform of an organization.
  2. Data Landing Zone which is a logical construct and a unit of scale in the Cloud-scale Analytics architecture that enables data retention and execution of data workloads for generating insights and value with data.

The architecture is modular by design and allows organizations to start small with a single Data Management Landing Zone and Data Landing Zone, but also allows to scale to a multi-subscription data platform environment by adding more Data Landing Zones to the architecture. Thereby, the reference design allows to implement different modern data platform patterns like data-mesh, data-fabric as well as traditional datalake architectures. Cloud-scale Analytics Scenario has been very well aligned with the data-mesh approach, and is ideally suited to help organizations build data products and share these across business units of an organization. If core recommendations are followed, the resulting target architecture will put the customer on a path to sustainable scale.

Cloud-scale Analytics


The Cloud-scale Analytics architecture represents the strategic design path and target technical state for your Azure data platform.


This repository describes a Data Product template for Data Streaming that can also be used for integrating streaming data into the Azure data platform. Data Products are another unit of scale inside a Data Landing Zone through the means of Resource Groups. Resource Groups inside the Data Landing Zone subscription are created and handed over to cross-functional teams to provide them an environment in which they can work on their own data use-cases. The ownership of this resource group and operation of services within is handed over to the Data Product teams. In order to enable self-service, the owning teams are free to deploy their own services within the guardrails set by Azure Policy. Repository templates can be used for these teams to more quickly scale within an organization and rollout common data analysis patterns not just once but multiple times across various use-cases. The ownership of templates is also handed over, which ultimately gives these teams a starting point while allowing them to enhance the template based on their specific requirements. This Data Product template deploys a set of services, which can be used for real-time data processing and integration. The template includes services such as EventHub, IoTHub, Stream Analytics and Azure Synapse. The Data Product teams can then leverage these tools to generate insights and value with data.

Note: Before getting started with the deployment, please make sure you are familiar with the complementary documentation in the Cloud Adoption Framework. Also, before deploying your first Data Product, please make sure that you have deployed a Data Management Landing Zone and at least one Data Landing Zone. The minimal recommended setup consists of a single Data Management Landing Zone and a single Data Landing Zone.

Deploy Cloud-scale Analytics Scenario

The Cloud-scale Analytics architecture is modular by design and allows customers to start with a small footprint and grow over time. In order to not end up in a migration project, customers should decide upfront how they want to organize data domains across Data Landing Zones. All Cloud-scale Analytics architecture building blocks can be deployed through the Azure Portal as well as through GitHub Actions workflows and Azure DevOps Pipelines. The template repositories contain sample YAML pipelines to more quickly get started with the setup of the environments.

Reference implementation Description Deploy to Azure Link
Cloud-scale Analytics Scenario Deploys a Data Management Landing Zone and one or multiple Data Landing Zones all at once. Provides less options than the the individual Data Management Landing Zone and Data Landing Zone deployment options. Helps you to quickly get started and make yourself familiar with the reference design. For more advanced scenarios, please deploy the artifacts individually. Deploy To Azure
Data Management Landing Zone Deploys a single Data Management Landing Zone to a subscription. Deploy To Azure Repository
Data Landing Zone Deploys a single Data Landing Zone to a subscription. Please deploy a Data Management Landing Zone first. Deploy To Azure Repository
Data Product Batch Deploys a Data Workload template for Data Batch Analysis to a resource group inside a Data Landing Zone. Please deploy a Data Management Landing Zone and Data Landing Zone first. Deploy To Azure Repository
Data Product Streaming Deploys a Data Workload template for Data Streaming Analysis to a resource group inside a Data Landing Zone. Please deploy a Data Management Landing Zone and Data Landing Zone first. Deploy To Azure Repository
Data Product Analytics Deploys a Data Workload template for Data Analytics and Data Science to a resource group inside a Data Landing Zone. Please deploy a Data Management Landing Zone and Data Landing Zone first. Deploy To Azure Repository

Deploy Data Product

To deploy the Data Product into your Data Landing Zone, please follow the step-by-step instructions:

  1. Prerequisites
  2. Create repository
  3. Setting up Service Principal
  4. Template Deployment
    1. GitHub Action Deployment
    2. Azure DevOps Deployment
  5. Known Issues

Contributing

Please review the Contributor's Guide for more information on how to contribute to this project via Issue Reports and Pull Requests.

data-product-streaming's People

Contributors

abdale avatar amanjeetsingh avatar analyticjeremy avatar bjcmit avatar esbran avatar hallihan avatar marvinbuss avatar mboswell avatar microsoft-github-policy-service[bot] avatar microsoftopensource avatar sudivate avatar vanwinkelseppe avatar xigyenge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-product-streaming's Issues

Bug: Not existing Synapse Dedicated SQL Pool causing errors in diagnosticsettings.bicep

Deployment Mode

Azure DevOps

Steps to reproduce

Deploying new diagnostics with the SQL pool setting as "false" causes "not found" error with the services/diagnosticsettings.bicep deployment. This occurs in both Streaming and Batch products. Likely, also Analytics seems to include similar declarations so likely affects there as well.

Error Message

{
    "status": "Failed",
    "error": {
        "code": "ResourceNotFound",
        "message": "The Resource 'Microsoft.Synapse/workspaces/xx-dev-synapse001/sqlPools/sqlPool001' under resource group 'xx-dev-di002' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix"
    }
}

Screenshots

No response

Code of Conduct

Several errors when deploying Data Product Streaming

Describe the bug

Several deployment errors appears when deploying Data Product Streaming :

  • during synapse deployment
  • during cosmos deployment
  • during role assignment

Steps to reproduce

  1. Deploy DMZ and DLZ using global template
  2. Deploy Streaming Data Product in ressources group 001 created during step 1

Screenshots

image

image

Synapse -> Seems related to Private endpoint deployment
image

Cosmos -> Seems related to Private endpoint deployment
image

Role Assignment -> Principal Not found
image

Update Bicep version and parameter file values

Following updates are required:

  1. Update the Bicep template based on the new Bicep release.
  2. Update parameter files for test and prod to include newer naming convention for test and prod environments.

Issue Template Improvement required

Syntax of one Issue template is incorrect and must be fixed (Documentation Issue). The empty author section must be removed and the empty title is also not allowed. Also, the default title of other Issues can be simplified.

Add infobox in Portal

Describe the solution you'd like

  • Add infobox in Portal that this is used for integration and product

Region specification and key vault purge protection

I tried to deploy the data domain in a different region as the data landing zone resulting in this error:

image

During the failed deployment, the key vault was created inside the domain rg. I had to delete the domain rg so I can re-create it under the same region as the data landing zone.

The deployment failed again this time with this error:

image

Assuming I want to keep the same name for my domain rg, it seems like I could not do that as the key vault is not purged. When I try to purge the keyvault from the portal it fails as during the creation process, purge protection was enabled. Even if I recover the key vault so I can disable purge protection, I am not able to disable it as once enabled, this option cannot be disabled.

See:

image

So basically I have no option but to change the name of my data domain and redeploy.

Two potential changes that could avoid this scenario from happening:

  • Update the prerequisites to say that the region of the data domain must be the same as the data landing zone instead of saying choose any region
  • Disable purge protection on the key vault

Documentation: Name change

Documentation Issue

Please change all instances of "Enterprise-scale analytics and ai" to "Data Management and Analytics Scenario" to align to marketing ask. This should happen for one-clicks and all documentation.

Code of Conduct

ACTION REQUIRED: Microsoft needs this private repository to complete compliance info

There are open compliance tasks that need to be reviewed for your data-node002 repo.

Action required: 4 compliance tasks

To bring this repository to the standard required for 2021, we require administrators of this and all Microsoft GitHub repositories to complete a small set of tasks within the next 60 days. This is critical work to ensure the compliance and security of your Azure GitHub organization.

Please take a few minutes to complete the tasks at: https://repos.opensource.microsoft.com/orgs/Azure/repos/data-node002/compliance

  • The GitHub AE (GitHub inside Microsoft) migration survey has not been completed for this private repository
  • No Service Tree mapping has been set for this repo. If this team does not use Service Tree, they can also opt-out of providing Service Tree data in the Compliance tab.
  • No repository maintainers are set. The Open Source Maintainers are the decision-makers and actionable owners of the repository, irrespective of administrator permission grants on GitHub.
  • Classification of the repository as production/non-production is missing in the Compliance tab.

You can close this work item once you have completed the compliance tasks, or it will automatically close within a day of taking action.

If you no longer need this repository, it might be quickest to delete the repo, too.

GitHub inside Microsoft program information

More information about GitHub inside Microsoft and the new GitHub AE product can be found at https://aka.ms/gim or by contacting [email protected]

FYI: current admins at Microsoft include @esbran, @daltondhcp, @marvinbuss

Test

Documentation Issue

Test

Code of Conduct

Microsoft.Devices Registration needed prior to deployment

Describe the bug

Error: ERROR: Deployment failed. Correlation ID: 87bf6b9a-7ce4-4669-b6a8-14203a0c974d. ***
  "error": ***
    "code": "MissingSubscriptionRegistration",
    "message": "The subscription is not registered to use namespace 'Microsoft.Devices'. See https://aka.ms/rps-not-found for how to register subscriptions.",
    "details": [
      ***
        "code": "MissingSubscriptionRegistration",
        "target": "Microsoft.Devices",
        "message": "The subscription is not registered to use namespace 'Microsoft.Devices'. See https://aka.ms/rps-not-found for how to register subscriptions."
      ***
    ]
  ***
***

Error: The process '/usr/bin/az' failed because one or more lines were written to the STDERR stream

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.