Giter Club home page Giter Club logo

dask-bigquery's Introduction

Dask-BigQuery

Tests Linting

Read data from Google BigQuery with Dask.

This package uses the BigQuery Storage API. Please refer to the data extraction pricing table for associated costs while using Dask-BigQuery.

Installation

dask-bigquery can be installed with pip:

pip install dask-bigquery

or with conda:

conda install -c conda-forge dask-bigquery

Authentication

Default credentials can be provided by setting the environment variable GOOGLE_APPLICATION_CREDENTIALS to the file name:

$ export GOOGLE_APPLICATION_CREDENTIALS=/home/<username>/google.json

For information on obtaining the credentials, use Google API documentation.

Example: read from BigQuery

dask-bigquery assumes that you are already authenticated.

import dask_bigquery

ddf = dask_bigquery.read_gbq(
    project_id="your_project_id",
    dataset_id="your_dataset",
    table_id="your_table",
)

ddf.head()

Example: write to BigQuery

With default credentials:

import dask
import dask_bigquery

ddf = dask.datasets.timeseries(freq="1min")

res = dask_bigquery.to_gbq(
    ddf,
    project_id="my_project_id",
    dataset_id="my_dataset_id",
    table_id="my_table_name",
)

With explicit credentials:

from google.oauth2.service_account import Credentials

# credentials
creds_dict = {"type": ..., "project_id": ..., "private_key_id": ...}
credentials = Credentials.from_service_account_info(info=creds_dict)

res = to_gbq(
    ddf,
    project_id="my_project_id",
    dataset_id="my_dataset_id",
    table_id="my_table_name",
    credentials=credentials,
)

Before loading data into BigQuery, to_gbq writes intermediary Parquet to a Google Storage bucket. Default bucket name is dask-bigquery-tmp. You can provide a diferent bucket name by setting the parameter: bucket="my-gs-bucket". After the job is done, the intermediary data is deleted.

If you're using a persistent bucket, we recommend configuring a retention policy that ensures the data is cleaned up even in case of job failures.

Run tests locally

To run the tests locally you need to be authenticated and have a project created on that account. If you're using a service account, when created you need to select the role of "BigQuery Admin" in the section "Grant this service account access to project".

You can run the tests with

$ pytest dask_bigquery

if your default gcloud project is set, or manually specify the project ID with

DASK_BIGQUERY_PROJECT_ID pytest dask_bigquery

History

This project stems from the discussion in this Dask issue and this initial implementation developed by Brett Naul, Jacob Hayes, and Steven Soojin Kim.

License

BSD-3

dask-bigquery's People

Contributors

bnaul avatar fabiorosado avatar j-bennet avatar jrbourbeau avatar ncclementi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.