Giter Club home page Giter Club logo

bigquery_information_schema_block's Introduction

BigQuery Information Schema Looker Block

This repository contains a Looker block for monitoring and optimizing Google BigQuery usage and performance, built off of the Information Schema tables provided natively within BigQuery.

Setup

Manifest

The manifest.lkml file contains a number of parameters to be configured. Review the inline comments in the manifest file for more details.

Required BQ Permissions

Generally, the required permissions (listed below) are available to the BigQuery Resource Admin and BigQuery Admin roles.

This block requires a Service Account with the following BigQuery permissions:

Highlighted Datasets / Explores

  • Jobs
    • The Jobs explore allows users to explore one of the four JOBS_BY_X tables
    • The table contains one row for each job, up to a retention limit of 180 days.
    • This is the richest dataset in this block, and can be used to answer questions including: job volume, slot usage, job timing and performance metrics, data volume, and many attributes that explain performance.
    • The explore also exposes nested fields, such as referenced tables, job stages, and job stage steps.
    • The specific table explored depends on the scope specified in your manifest.lkml file.
    • Depending on the scope, the SQL text of jobs may or may not be available. If it is, dimensions are also provided that extract Looker contextual information, such as history ID or user ID, from the SQL text
    • If you want to explore a JOBS_BY_X other than the one specified in your scope, two hidden explores exist that explicitly override the scope: jobs_in_project and jobs_in_organization
  • Jobs Timeline
    • The Jobs Timeline explore allows users to explore one of the four JOBS_TIMELINE_BY_X tables
    • The table contains one row for each second that each job was active.
    • Although this explore exposes fewer fields & nested detail than the jobs explore, it can be more convenient for aggregating certain metrics based on when queries were running, rather than based on when queries were created.
    • The specific table explored depends on the scope specified in your manifest.lkml file.
    • If you want to explore a JOBS_TIMELINE_BY_X other than the one specified in your scope, two hidden explores exist that explicitly override the scope: jobs_timeline_in_project and jobs_timeline_in_organization
  • Reservations data
  • Cross-table explore
    • Although it exposes fewer details than the dedicated Jobs explore, the "All" explore joins together multiple fact tables in a way that can facilitate certain analyses, such as measuring slots usage and slot capacity co-dimensioned at the project leve.l

Highlighted Dashboards

Pulse dashboard thumbnail preview
  • Pulse - The Pulse dashboard is a self-contained top-level dashboard that you can use to understand current consumption, performance and efficiency, with the use of Week-to-date and week-over-week metrics throughout.
  • Time Window Investigation - The Time Window Investigation dashboard focuses on a specific timerange, removing any comparisons to prior periods, and providing tiles that can help you pick out problematic patterns or outliers to drill into.
  • Job Lookup - Primarily intended to be accessed from the ellipsis menu from any Job ID as a "drill across", the Job Lookup dashboard is a deep-dive into a single job, letting you see both job metrics as well as step-by-step query plans, and information about referenced tables.

Concepts

Slots and Capacity

A key concept for understanding BigQuery usage is slots. BigQuery uses Slots (virtual CPUs) to execute queries in a heavily-distributed parallel architecture. Customers on the flat-rate pricing model explicitly choose how many slots to reserve, also known as Slot Commitments, which can be purchased at the Annual, Monthly, or Flex (60-second) level. Queries run within that capacity, and you pay for that capacity continuously every second it's deployed. For example, if you purchase 2,000 BigQuery slots, your queries in aggregate are limited to using 2,000 virtual CPUs at any given time. You will have this capacity until you delete it, and you will pay for 2,000 slots until you delete them.

Bytes Shuffled to Disk

The percentage of data written to shuffle and spilled to disk can be a good indicator of queries that are overwhelming slot resources and could be further optimized, for example, queries with heavy data skews. Use the Job Lookup Dashboard to drill into individual queries that are spilling heavy volumes of data to disk, viewing their individual stages and identifying opportunities for optimization.

% of Cached Queries

A higher percentage of Cached queries will result in lower on-demand costs and lower resource utilization. In addition to reducing costs, queries that use cached results are significantly faster because BigQuery does not need to compute the result set.

BigQuery Information Schema Data Structure

More information on the Information Schema can be found in Google Cloud documentation.

bigquery_information_schema_block's People

Contributors

annaserova avatar jeffrey-martinez avatar davidbrinegar avatar manmat avatar davidchiaramonte avatar mauriziodimatteo avatar agregori97 avatar jeremytchang avatar shylaja4mile avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.