Giter Club home page Giter Club logo

csvkit's Introduction

wireservice

Name Badges
agate Docs CI Coverage Lint PyPI PyPI downloads
agate-charts Docs CI Coverage Lint PyPI PyPI downloads
agate-dbf Docs CI Coverage Lint PyPI PyPI downloads
agate-excel Docs CI Coverage Lint PyPI PyPI downloads
agate-lookup Docs CI Coverage Lint PyPI PyPI downloads
agate-remote Docs CI Coverage Lint PyPI PyPI downloads
agate-sql Docs CI Coverage Lint PyPI PyPI downloads
agate-stats Docs CI Coverage Lint PyPI PyPI downloads
csvkit Docs CI Coverage Lint PyPI PyPI downloads
leather Docs CI Coverage Lint PyPI PyPI downloads
proof Docs CI Coverage Lint PyPI PyPI downloads

csvkit's People

Contributors

acompa avatar bryant1410 avatar bsilverthorn avatar dannguyen avatar dergachev avatar ewheeler avatar fgregg avatar gotoplanb avatar groutr avatar jankatins avatar jayvdb avatar jeroenjanssens avatar jest avatar jez avatar joegermuska avatar jpmckinney avatar karriek avatar kevinschaul avatar marksmayo avatar martinburch avatar mattdudys avatar mpettis avatar nhoffman avatar onyxfish avatar reidab avatar ryanpitts avatar srstsavage avatar thatmattbone avatar thejefflarson avatar volpino avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

csvkit's Issues

csvjson

Convert the king of tabular data formats into the king of hierarchical data formats for serialization across the wire.

csvr

Either convert input csv to .rdata format or open an R session with data already loaded into a local dataframe.

csvslice

Head, tail, etc. don't work correctly with embedded newlines

csvcount

Counts the number of rows in a csv file (wc -l does not work with embedded newlines)

csvstack

Stack the rows from multiple csv files on top of one other, optionally adding a common grouping value to each set of rows.

Add generation of insert statements to csvsql

Should be able to do:

csvsql --insert test.csv | psql

To create a table and load data.

This is very difficult as sqlalchemy dialects only generate parameterized insert statements. Would need to either substitute actual values in (different for each dialect) or find another way of creating insert statements. (Or not use sqlalchemy.)

csvuniq

Strip duplicate rows.

As with csvsort, to make up for the fact that internal linebreaks mess with the unix utils.

csvsort

Sort entire rows or by columns

To make up for the fact that unix sort chokes on internal linebreaks.

csvgrep

Like unix grep, but allows specifying single columns to filter on (and handles in-row newlines correctly)

csvdiff

Identify rows/cells which are different between two spreadsheets

Try/except block in outer joins isn't atomic

I'm sketched out by the append-in-for-loops inside a try-catch block.

It doesn't look atomic to me. What if there were an exception midway-through adding a row? come columns which already had a value would have a None also appended, no?

I don't have time to write a test case tonight, so maybe I'm missing something or maybe there's a test for it already and you should just tell me to RTFC.

anyway, it might be nice to abstract an _outer_join method. I was starting down that path, passing in keys and some rules about when to allow for None values. I'm reading this code trying to decide how to handle the one-to-many case.

Missing xlsx module

On OSX I get this when testing out one of the command-line utils:

(csvkit)Derek-Willis-4901:csvkit 200025$ ./in2csv
Traceback (most recent call last):
File "./in2csv", line 7, in
from csvkit import convert
File "/Users/200025/code/csvkit/csvkit/csvkit/convert/init.py", line 6, in
from xlsx import xlsx2csv
ImportError: No module named xlsx

The pip install went fine, but the tests also fail because of the missing library.

Make --no-breaks a standard output flag

Strips inline newlines from fields so they the output can be piped through standard unix utilities such as head, tail, sort, grep, uniq, etc. Should work with all tools, but this doesn't deserve its own utility.

csvpy

Open python shell with CSV data already loaded into a Table or csv reader.

Optimize csvcut -n by bypassing type-inference

I kind of think csvcut -n should be fast. If inference were not default, you could stop after reading the first row. What do you think to making it only on request, or barring that, having an option that suppresses it?

XLSX support in in2csv

Attempted this but no available xlsx library for python gracefully handles Windows, Mac and OpenOffice produced files. openpyxl is the closest, but needs to be patched to support both OO-produced files and Mac date formats.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.