Giter Club home page Giter Club logo

etlite's Introduction

Build Status

ETlite

Extract/Transform Light - a simple library for reading delimited files.

Example

Given CSV file:

Area id,Male,Female,Area
A12345,34,45,0.25
A12346,108,99,0.32

Define a list of transformation:

transformations = [
    # Map existing fields into dictionary.
    # For nested dictionaries use dot.delimited.keys.
    # Optional "via" parameter takes a callable returning transformed value.
    { "from": "Area id", "to": "id" },
    { "from": "Male", "to": "population.male", "via": int },
    { "from": "Female", "to": "population.female", "via": int },
    { "from": "Area", "to": "area", "via": float },

    # You can also add computed values, not present in the original data source.
    # Computer values take transformed dictionary as argument
    # and they do not require "from" parameter:
    {
        "to": "population.total",
        "via": lambda x: x['population']['male'] + x['population']['female']
    },
    # Note that transformations are executed in the order they were defined.
    # This transformation uses population.total value computed in the previous step:
    {
        "to": 'population.density',
        "via": lambda x: round(x['population']['total'] / x['area']),
    }
]

Read the file:

from etlite import delim_reader

with open("mydatafile.csv") as csvfile:
  reader = delim_reader(csvfile, transformations)
  data = [row for row in reader]

This produces a list of dictionaries:

[
    {
        'id': 'A12345',
        'area': 0.25,
        'population': {
            'male': 34,
            'female': 45,
            'total': 79,
            'density': 316
        }
    },
    {
        'id': 'A12346',
        'area': 0.32,
        'population': {
            'male': 108,
            'female': 99,
            'total': 207,
            'density': 647
        }
    }
]

delim_reader options

ETlite is just a thin wrapper on top of Python built-in CSV module. Thus you can pass to delim_reader same options as you would pass to csv.reader. For example:

reader = delim_reader(csvfile, transformations, delimiter="\t")

Exception handling

If desired transtormation cannot be performed, ETLite will raise TransformationError. If you do not want to abort data loading, you can pass an error handler to delim_reader.

Error handler must be a function. It will be passed an instance of TransformationError. Note: on_error must be pased as keywod argument.

from etlite import delim_reader

transformations = [
    # ...
]

def error_handler(err):
    # err is an instance of TransformationError
    print(err) # prints error message
    print(err.record) # prints raw record, prior to transformation


with open('my-data.csv') as stream:
    reader = delim_reader(stream, transformations, on_error=error_handler)
    for row in reader:
        do_something(row)

etlite's People

Contributors

shelldweller avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.