Giter Club home page Giter Club logo

data-analytics-golden-demo's Introduction

Enterprise-Data-Analytics-Demo

Deploys a end to end working demo of a Data Analytics / Data Processing using Google Cloud. All the services are connected, configured and ready to run. All the artifacts are deployed and you can immedately start using.

alt tag

Deploying using Cloud Shell

You can deploy this to a new project or an existing project.

  • New Project:
    • This requires you to be an Org Admin. This is a great for personal projects or if IT is running the script.
  • Existing Project:
    • This requires a project to be created in advance. IT typically will create and provide a service account which is used to deploy. Or IT can allow you to impersonate the service account (more secure then exporting a JSON credential file)

To deploy to New Project (Preferred method)

  1. Open a Google Cloud Shell: http://shell.cloud.google.com/
  2. Type: git clone https://github.com/GoogleCloudPlatform/enterprise-data-analytics-demo
  3. Switch the prompt to the directory: cd enterprise-data-analytics-demo
  4. Run the deployment script: source deploy.sh
  5. Authorize the login (a popup will appear)
  6. Follow the prompts: Answer β€œYes” for each.

To deploy to an Existing Project

  1. Review the code in the deploy-use-existing-project.sh
  2. You should have a project and a service account with the Owner role
  3. You will just hard code the project and service account information into the script. The script has code in it to "emualte" someone else creating a project.

After the deployment

  • Open Cloud Composer. You will see the Run-All-Dags DAG running. This will run the DAGs needed to see the project with data. Once this is done you can run the BigQuery stored procedures and other items in the demo.

Possible Errors:

  1. If the script fails to enable a service or timeouts, you can rerun and if that does not work, run ./clean.sh and start over
  2. If the script has security type message (unauthorized), then double check the configure roles/IAM security.
  3. If you get the error "Error: Error when reading or editing Project Service : Request List Project Services data-analytics-demo-xxxxxxxxx returned error: Failed to list enabled services for project data-analytics-demo-xxxxxxxxx: Get "https://serviceusage.googleapis.com/v1/projects/data-analytics-demo-xxxxxxxxx/services?alt=json&fields=services%2Fname%2CnextPageToken&filter=state%3AENABLED&prettyPrint=false". You need to start over. Run ./clean.sh and then run source deploy.sh again. This is due to the service usage api not getting propagated with 4 minutes...
  • Delete your failed project
  1. If you get a "networking error" with some dial tcp message [2607:f8b0:4001:c1a::5f], then your cloud shell had a networking glitch, not the Terraform network. Restart the deployment "source deploy.sh". (e.g. Error creating Network: Post "https://compute.googleapis.com/compute/beta/projects/bigquery-demo-xvz1143xu9/global/networks?alt=json": dial tcp [2607:f8b0:4001:c1a::5f]:443: connect: cannot assign requested address)

Folders

  • cloud-composer
    • dags - all the DAGs for Airflow which run the system and seed the data
    • data - all the bash and SQL scripts to deploy
  • dataflow
    • Dataflow job that connects to the public Pub/Sub sample streaming taxi data. You start this using composer.
  • dataproc
    • Spark code to that is used to process the initial downloaded data
  • notebooks
    • Sample notebooks that can be run in Vertex AI. To create the managed notebook, use the DAG in composer.
  • sql-scripts
    • The BigQuery SQL sample scripts. These are currently deployed as stored procedures. You can edit each stored procedure and run the sample code query by query.
  • terraform
    • the entry point for when deploying via cloud shell or your local machine. This uses service account impersonation
  • terraform-modules
    • api - enables the GCP apis
    • org-policies - sets organization policies at the project level that have to be "disabled" to deploy the resources.
    • org-policies-deprecated - an older apporach for org policies and is needed when your cloud build account is in a different domain
    • project - creates the cloud project if a project number is not provided
    • resouces - the main set of resources to deploy
    • service-account - creates a service account if a project numnber is not provided. The service account will be impersonated during the deployment.
    • service-usage - enables the service usage API as the main user (non-impersonated)
    • sql-scripts - deploys the sql scripts

data-analytics-golden-demo's People

Contributors

adampaternostro avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.