Giter Club home page Giter Club logo

csvs-to-sqlite's Introduction

csvs-to-sqlite

PyPI Travis CI

Convert CSV files into a SQLite database

Basic usage:

csvs-to-sqlite myfile.csv mydatabase.db

This will create a new SQLite database called mydatabase.db containing a single table, myfile, containing the CSV content.

You can provide multiple CSV files:

csvs-to-sqlite one.csv two.csv bundle.db

The bundle.db database will contain two tables, one and two.

This means you can use wildcards:

csvs-to-sqlite ~/Downloads/*.csv my-downloads.db

If you pass a path to one or more directories, the script will recursively search those directories for CSV files and create tables for each one.

csvs-to-sqlite ~/path/to/directory all-my-csvs.db

Refactoring columns into separate lookup tables

Let's say you have a CSV file that looks like this:

county,precinct,office,district,party,candidate,votes
Clark,1,President,,REP,John R. Kasich,5
Clark,2,President,,REP,John R. Kasich,0
Clark,3,President,,REP,John R. Kasich,7

(Real example taken from the Open Elections project)

You can now convert selected columns into separate lookup tables using the new --extract-column option (shortname: -c) - for example:

csvs-to-sqlite openelections-data-*/*.csv \
    -c county:County:name \
    -c precinct:Precinct:name \
    -c office -c district -c party -c candidate \
    openelections.db

The format is as follows:

column_name:optional_table_name:optional_table_value_column_name

If you just specify the column name e.g. -c office, the following table will be created:

CREATE TABLE "office" (
    "id" INTEGER PRIMARY KEY,
    "value" TEXT
);

If you specify all three options, e.g. -c precinct:Precinct:name the table will look like this:

CREATE TABLE "Precinct" (
    "id" INTEGER PRIMARY KEY,
    "name" TEXT
);

The original tables will be created like this:

CREATE TABLE "ca__primary__san_francisco__precinct" (
    "county" INTEGER,
    "precinct" INTEGER,
    "office" INTEGER,
    "district" INTEGER,
    "party" INTEGER,
    "candidate" INTEGER,
    "votes" INTEGER,
    FOREIGN KEY (county) REFERENCES County(id),
    FOREIGN KEY (party) REFERENCES party(id),
    FOREIGN KEY (precinct) REFERENCES Precinct(id),
    FOREIGN KEY (office) REFERENCES office(id),
    FOREIGN KEY (candidate) REFERENCES candidate(id)
);

They will be populated with IDs that reference the new derived tables.

Installation

pip install csvs-to-sqlite

csvs-to-sqlite --help

Usage: csvs-to-sqlite [OPTIONS] PATHS... DBNAME

  PATHS: paths to individual .csv files or to directories containing .csvs

  DBNAME: name of the SQLite database file to create

Options:
  --replace-tables           Replace tables if they already exist
  -c, --extract-column TEXT  One or more columns to 'extract' into a separate
                             lookup table. If you pass a simple column name
                             that column will be replaced with integer foreign
                             key references to a new table of that name. You
                             can customize the name of the table like so:

                                 --extract-column state:States:state_name

                             This will pull unique values from the 'state'
                             column and use them to populate a new 'States'
                             table, with an id column primary key and a
                             state_name column containing the strings from the
                             original column.
  -f, --fts TEXT             One or more columns to use to populate a full-
                             text index
  --version                  Show the version and exit.
  --help                     Show this message and exit.

csvs-to-sqlite's People

Contributors

simonw avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.