Giter Club home page Giter Club logo

bqtools's Introduction

Build Status PyPI version

Python Tools for BigQuery

Why?

For data collection and data exploration, we like to work with BigQuery. But we have not found a python library, to easily handle recurring tasks like adding new data (of potentially inconsistent schema) and schema migrations. So we took a couple of our solutions for those tasks and put them into this library.

What?

bqtools provides a light-weight solution to explicit schema management with python-native types (unlike pandas dtype) and some convenient type checking, inference and conversions. Table-objects created by bqtools can be read from BigQuery, stored locally, read from a local file and written to BigQuery. Table schemas can be changed and data can be added or modified.

Install

pip install --upgrade bqtools

Examples:

Create basic tables

from fourtytwo import bqtools

schema = [
    {'name': 'number', 'field_type': 'INTEGER'},
    {'name': 'text', 'field_type': 'STRING'},
    {'name': 'struct', 'field_type':'RECORD', 'mode':'REPEATED', 
        'fields': [
            {'name':'integer', 'field_type':'INTEGER'},
            {'name':'text', 'field_type':'STRING'}
        ]
    }
]
# valid BigQuery types see: 
# https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types
# geo and array are currently not/not fully supported

# data = columns of lists
table = bqtools.BQTable(
    schema=schema, 
    data=[[1, 2, 3, 4], ['a', 'b', 'c', 'd']]
)

# data = rows of dicts
table = bqtools.BQTable(
    schema=schema, 
    data=[
        {'number': 1, 'text': 'a'}, 
        {'number': 2, 'text': 'b'},
        ...
    ]
)

View data

print(table.data)       # list of all columns
print(table.rows(n=10)) # list of first n rows

# convert to pandas.DataFrame
df = table.to_df()               
# warning: pandas dtypes may be inconsistent 
# with BigQuery Schema field_types

Append data

rows = [{'number': 5, 'text': 'e'}]
table.append(rows)

row = [[6, 'f']]
table.append(rows)

Load table from BigQuery

# requires environment variable GOOGLE_APPLICATION_CREDENTIALS 
# or parameter credentials='path-to-credentials.json'
table = bqtools.read_bq(
    table_ref='project_id.dataset_id.new_table_id', 
    limit=10,           # limit query rows
    schema_only=False   # set True to only add data
)

Modify table schema

# change column order and field_type
new_schema = [
    {'name': 'text', 'field_type': 'STRING'},
    {'name': 'number', 'field_type': 'FLOAT'},
]
table.schema(new_schema)

# change column names
table.rename(columns={'number': 'decimal'})

Write table to BigQuery

# requires environment variable GOOGLE_APPLICATION_CREDENTIALS
# or parameter credentials='path-to-credentials.json'
table.to_bq(table_ref, mode='append')

Persist tables locally

# write to local file (compressed binary format)
table.save('local_table.bqt')

# load from local file
table = bqtools.load('local_table.bqt')

bqtools's People

Contributors

jorahn avatar lingchiachen avatar shahzeb42digital avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

bqtools's Issues

retry failed uploads

handle exception google.api_core.exceptions.InternalServerError: 500 Error encountered during execution. Retrying may solve the problem. in table.to_bq() at step load_job.result() with retry (default n=3) and exponential backoff.

convenience methods to set/add DataFrame to table.data

table.append(df)
def append(rows):
    if isinstance(rows, pd.DataFrame):
        rows = rows.to_dict('records')
    # ...

table.data = df
def _set_data(self, data):
    if isinstance(data, pd.DataFrame):
        data = data.to_dict('records')
    # ...

append fields to schema fails

table.schema += [{'name': 'test', 'field_type': 'INTEGER'}]
table.schema = table.schema + [{'name': 'test', 'field_type': 'INTEGER'}]

Both variations fail with different error messages. See tests/test_bqtools.py: test_bqtools_append_schema

append data to empty table.data fails - required for schema_only

>>> from fourtytwo import bqtools
>>> table = bqtools.read_bq(table_ref='project.dataset.table', schema_only=True)
>>> table.append([])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/bqtools/fourtytwo/bqtools/__init__.py", line 224, in append
    for index in range(len(data)):
TypeError: object of type 'NoneType' has no len()

File "/bqtools/fourtytwo/bqtools/__init__.py", line 224, in append:

    def append(self, rows):
        append_columns = _rows_to_columns(rows, self.schema)
        data = self.data
        for index in range(len(data)):
            data[index] += append_columns[index]

Probably replace range(len(data)) with range(len(self.schema))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.