Giter Club home page Giter Club logo

faro's Introduction

PyPI - License PyPI - Python Version PyPI Build Status

Overview

faro is a fast, simple, and intuitive SQL-driven data analysis library for Python. It is built on top of sqlite and is intended to complement the existing data analysis packages in the Python eco-system, such as numpy, pandas, and matplotlib by providing easy interoperability between them. It also integrates with Jupyter by default to provide readable and interactive displays of queries and tables.

Usage

Create a Database object and give it a name.

from faro import Database

db = Database('transportation')

To add tables to the in-memory database, simply specify the name of the file. Supported file types include: csv, json, and xlsx. add_table inserts the contents of a file into a new table within the database. It can automatically detect the filetype and parse the file contents accordingly. In this example we load two different tables, one in csv format, and the other in json format.

db.add_table('cars.json', name='cars')
db.add_table('airports.csv', name='airports')

We can also directly pass pandas.DataFrame or faro.Table objects to be added to the database. A helpful pattern when dealing with more complex parsing for a specific file is to read it into memory using pandas then add the DataFrame to the faro.Database.

buses = pd.DataFrame({
  'id' : [1, 2, 3, 4, 5],
  'from' : ['Houston', 'Atlanta', 'Chicago', 'Boston', 'New York'],
  'to' : ['San Antonio', 'Charlotte', 'Milwaukee', 'Cape Cod', 'Buffalo']
})

db.add_table(buses, name='buses')

Alternatively, we can directly assign to a table name as a property of the table object. Using this method, however, will also replace the entire table as opposed to the options offered by add_table()

db.table.buses = buses

We can now query against any table in the database using pure SQL, and easily interact with the results in a Jupyter Notebook.

sql = """
SELECT iata,
       name,
       city,
       state
  FROM airports
 WHERE country = 'USA'
 LIMIT 5
"""
db.query(sql)
iata name city state
0 00M Thigpen Bay Springs MS
1 00R Livingston Municipal Livingston TX
2 00V Meadow Lake Colorado Springs CO
3 01G Perry-Warsaw Perry NY
4 01J Hilliard Airpark Hilliard FL

If we want to interact with the data returned by the query, we can easily transform it into whatever data type is most convenient for the situation. Supported type conversions include: List[Tuple], Dict[List], numpy.ndarray, and pandas.DataFrame.

table = db.query(sql)
type(table)
>>> faro.table.Table

df = table.to_dataframe()
type(df)
>>> pandas.core.frame.DataFrame

matrix = table.to_numpy()
type(matrix)
>>> numpy.ndarray

We can also interact with the tables in our database by accessing them as properties of the table object. For example:

db.table.buses
id from to
1 1 Houston San Antonio
2 2 Atlanta Charlotte
3 3 Chicago Milwaukee
4 4 Boston Cape Cod
5 5 New York Buffalo

faro's People

Contributors

alxwrd avatar dependabot[bot] avatar yanniskatsaros avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

faro's Issues

Get all column names and types from a table in the database

Feature Enhancement

Get the names of all columns (and their types) from a particular table in the database.

Example Usage

from import Database

db = Database('example')
db.add_table('test.csv', 'test')

db.columns('test')
>>> ['column_1', 'column_2', 'column_3']

db.info('test')
>>> {'column_1' : str, 'column_2' : int, 'column_3' : float}

Remove underlying `pandas` dependency.

In order to minimize "bloat" in the library, it is possible to make faro a "pure-Python" package by removing the pandas dependency for the underlying operations and instead opt for customized data structures such as namedtuple or dataclass for Python 3.7. This would mainly affect the underlying implementation of the faro.Table class.

This decision would affect the direction of the package in two major ways.

  1. It would restrict users to Python >= 3.7 (due to use of dataclasses)
  2. It would require a re-write of all pandas dependent operations.

Conversion from a faro.Table to a numpy.ndarray or a pandas.DataFrame would still be supported, but with optional dependencies for the user.

Get a table object directly from the database

Feature Enhancement

Directly query the database and get the contents of an entire table without having to write db.query('SELECT * FROM table') each time.

Example Usage

from faro import Table, Database

db = Database('example')
db.add_table('example.csv', name='example')

table : Table = db.table('example')
type(table)
>>> faro.table.Table

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.