Giter Club home page Giter Club logo

faster-than-csv's Introduction

Faster-than-CSV

Benchmark Results

Library Time (Speed)
Pandas read_csv() 20.09
NumPy fromfile() 3.88
NumPy genfromtxt() 4.00
NumPy loadtxt() 1.26
csv (std lib) 0.40
csv (list) 0.38
csv (map) 0.37
Faster_than_csv 0.08
  • This CSV Lib is ~300 Lines of Code.
  • Benchmarks run on Docker from Dockerfile on this repo.
  • Speed is IRL time to complete 10000 CSV Parsings.
  • Lines Of Code counted using CLOC.
  • Direct dependencies of the package when ready to run.
  • Benchmarks run on Docker from Dockerfile on this repo.
  • Stats as of year 2021.
  • x86_64 64Bit AMD, SSD, Arch Artix Linux.

Use

import faster_than_csv as csv

csv.csv2list("example.csv")                     # See Docs for more info.
                                                # Custom Separators supported.
csv.csv2json("example.csv", indentation=4)      # CSV to JSON, Pretty-Printed.

csv.csv2htmltable("example.csv")                # CSV to HTML+CSS Table (No JavaScript).

csv.read_clipboard()                            # CSV from the Clipboard.

csv.diff_csvs("example.csv", "anotherfile.csv") # Diff optimized for CSVs.
  • Input: CSV, TSV, Clipboard, File, URL, Custom.
  • Output: CSV, TSV, HTML, JSON, NDJSON, Diff, File, Custom.

csv2dict()

Description: Takes a path of a CSV file string, process CSV and returns a list of dictionaries. This is very similar to pandas.read_csv(filename).

Arguments:

  • csv_file_path path of the CSV file, str type, required, must not be empty string.
  • separator Separator character of the CSV data, str type, optional, defaults to ',', must not be empty string.
  • quote Quote character of the CSV data, str type, optional, defaults to '"', must not be empty string.

Returns: Data from the CSV, dict type.

csv2list()

Description: Takes a path of a CSV file string, process CSV and returns a list.

Arguments:

  • csv_file_path path of the CSV file, str type, required, must not be empty string.
  • separator Separator character of the CSV data, str type, optional, defaults to ',', must not be empty string.
  • quote Quote character of the CSV data, str type, optional, defaults to '"', must not be empty string.

Returns: Data from the CSV, list type.

read_clipboard()

Description: Reads CSV string from Clipboard, process CSV and returns a list of dictionaries. This is very similar to pandas.read_clipboard(). This works on Linux, Mac, Windows.

Arguments:

  • separator Separator character of the CSV data, str type, optional, defaults to ',', must not be empty string.
  • quote Quote character of the CSV data, str type, optional, defaults to '"', must not be empty string.

Returns: Data from the CSV, dict type.

csv2json()

Description: Takes a path of a CSV file string, process CSV and returns JSON.

Arguments:

  • csv_file_path path of the CSV file, str type, required, must not be empty string.
  • separator Separator character of the CSV data, str type, optional, defaults to ',', must not be empty string.
  • quote Quote character of the CSV data, str type, optional, defaults to '"', must not be empty string.
  • indentation Pretty-Printed or Minified JSON output, int type, optional, 0 is Minified, 4 is Pretty-Printed, you can use any integer to adjust the indentation.

Returns: Data from the CSV as JSON Minified Single-line string computer-friendly, str type.

csv2ndjson()

Description: Takes a path of a CSV file string, process CSV and returns NDJSON.

Arguments:

  • csv_file_path path of the CSV file, str type, required, must not be empty string.
  • ndjson_file_path path of the NDJSON file, str type, required, must not be empty string.
  • separator Separator character of the CSV data, str type, optional, defaults to ',', must not be empty string.
  • quote Quote character of the CSV data, str type, optional, defaults to '"', must not be empty string.

Returns: None. Data from the CSV as NDJSON https://github.com/ndjson/ndjson-spec, str type.

csv2htmltable()

Description: Takes a path of a CSV file string, process CSV and returns the data rendered on HTML Table.

Arguments:

  • csv_file_path path of the CSV file, str type, required, must not be empty string, defaults to "", if its empty string then No file is written.
  • html_file_path path of the CSV file, str type, optional, can be empty string.
  • separator Separator character of the CSV data, str type, optional, defaults to ',', must not be empty string.
  • quote Quote character of the CSV data, str type, optional, defaults to '"', must not be empty string.
  • header_html HTML Header, str type, optional, defaults to Bulma CSS, can be empty string.

Returns: Data from the CSV as HTML Table, str type, raw HTML (no style at all).

csv2karax()

Description: Takes a path of a CSV file string, process CSV and returns the data rendered as a Karax HTML Table.

Arguments:

  • csv_file_path path of the CSV file, str type, required, must not be empty string.
  • separator Separator character of the CSV data, str type, optional, defaults to ',', must not be empty string.
  • quote Quote character of the CSV data, str type, optional, defaults to '"', must not be empty string.

Returns: Karax DSL, str type.

csv2terminal()

Description: Takes a path of a CSV file string, process CSV and prints to terminal a colored prety-printed table.

Arguments:

  • csv_file_path path of the CSV file, str type, required, must not be empty string, defaults to "", if its empty string then No file is written.
  • column_width column width of the wider column, required, int type, must not be 0, must not be negative.
  • separator Separator character of the CSV data, str type, optional, defaults to ',', must not be empty string.
  • quote Quote character of the CSV data, str type, optional, defaults to '"', must not be empty string.

Returns: None.

csv2xml()

Description: Takes a path of a CSV file string, process CSV and returns a Valid XML string. Output is guaranteed to be always Valid XML.

Arguments:

  • csv_file_path path of the CSV file, str type, required, must not be empty string.
  • separator Separator character of the CSV data, str type, optional, defaults to ',', must not be empty string.
  • quote Quote character of the CSV data, str type, optional, defaults to '"', must not be empty string.
  • header_xml XML Header of the XML string, str type, optional, can be empty string, defaults to "<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n".

Returns: XML, str type.

tsv2csv()

Description: Takes a path of a CSV file string, process CSV and returns a TSV.

Arguments:

  • csv_file_path path of the CSV file, str type, required, must not be empty string.
  • separator1 Separator character of the CSV data, str type, optional, must not be empty string.
  • separator2 Separator character of the CSV data, str type, optional, must not be empty string.
  • quote Quote character of the CSV data, str type, optional, defaults to '"', must not be empty string.

Returns: Data from the CSV as TSV, str type.

diff_csvs()

Description: Takes 2 paths of 2 CSV files, process CSV and returns the Diff of the 2 CSV.

Arguments:

  • csv_file_path0 path of the CSV file, str type, required, must not be empty string, file must exist.
  • csv_file_path1 path of the CSV file, str type, required, must not be empty string, file must exist.

Returns: Diff.

For more Examples check the Examples and Tests.

Instead of having a pair of functions with a lot of arguments that you should provide to make it work, we have tiny functions with very few arguments that do one thing and do it as fast as possible.

Install

  • pip install faster_than_csv

Docker

  • Make a quick test drive on Docker!.
$ ./build-docker.sh
$ ./run-docker.sh
$ ./run-benchmark.sh  # Inside Docker.

Dependencies

  • None

Platforms

  • ✅ Linux
  • ✅ Windows
  • ✅ Mac
  • ✅ Android
  • ✅ Raspberry Pi
  • ✅ BSD

Requisites

  • Python 3.6+ 64Bit.

Windows

  • If installation fails on Windows, just use the Source Code:

win-compile

Stars

Star faster-than-csv on GitHub

Contributors

FAQ

  • Whats the idea, inspiration, reason, etc ?.

Feel free to Fork, Clone, Download, Improve, Reimplement, Play with this Open Source. Make it 10 times faster, 10 times smaller.

  • This requires Cython ?.

No.

  • This runs on PyPy ?.

No.

  • This runs on Python2 ?.

I dunno. (Not supported)

  • How can I Install it ?.

https://github.com/juancarlospaco/faster-than-csv/releases

If you dont understand how to install it, you can just download, extract, put the files on the same folder as your *.py file and you are good to go.

  • How can be faster than NumPy ?.

I dunno.

  • How can be faster than Pandas ?.

I dunno.

  • Why needs 64Bit ?.

Maybe it works on 32Bit, but is not supported, integer sizes are too small, and performance can be worse.

  • Why needs Python 3 ?.

Maybe it works on Python 2, but is not supported, and performance can be worse, we suggest to migrate to Python3.

  • Can I wrap the functions on a try: except: block ?.

Functions do not have internal try: except: blocks, so you can wrap them inside try: except: blocks if you need very resilient code.

  • PIP fails to install or fails build the wheel ?.

Add at the end of the PIP install command:

--isolated --disable-pip-version-check --no-cache-dir --no-binary :all:

Not my Bug.

  • How to Build the project ?.

build.sh

  • How to Package the project ?.

package.sh

  • This requires Nimble ?.

No.

  • Whats the unit of measurement for speed ?.

Unmmodified raw output of Python timeit module.

Please send Pull Request to Python to improve the output of timeit.

Send Crypto, request features, donate today

Bitcoin BTC

BEP20 Binance Smart Chain Network BSC

0xb78c4cf63274bb22f83481986157d234105ac17e

BTC Bitcoin Network

1Pnf45MgGgY32X4KDNJbutnpx96E4FxqVi
Ethereum ETH Dai DAI Uniswap UNI Axie Infinity AXS Smooth Love Potion SLP

BEP20 Binance Smart Chain Network BSC

0xb78c4cf63274bb22f83481986157d234105ac17e

ERC20 Ethereum Network

0xb78c4cf63274bb22f83481986157d234105ac17e
Tether USDT

BEP20 Binance Smart Chain Network BSC

0xb78c4cf63274bb22f83481986157d234105ac17e

ERC20 Ethereum Network

0xb78c4cf63274bb22f83481986157d234105ac17e

TRC20 Tron Network

TWGft53WgWvH2mnqR8ZUXq1GD8M4gZ4Yfu
Solana SOL

BEP20 Binance Smart Chain Network BSC

0xb78c4cf63274bb22f83481986157d234105ac17e

SOL Solana Network

FKaPSd8kTUpH7Q76d77toy1jjPGpZSxR4xbhQHyCMSGq
Cardano ADA

BEP20 Binance Smart Chain Network BSC

0xb78c4cf63274bb22f83481986157d234105ac17e

ADA Cardano Network

DdzFFzCqrht9Y1r4Yx7ouqG9yJNWeXFt69xavLdaeXdu4cQi2yXgNWagzh52o9k9YRh3ussHnBnDrg7v7W2hSXWXfBhbo2ooUKRFMieM
Sandbox SAND Decentraland MANA

ERC20 Ethereum Network

0xb78c4cf63274bb22f83481986157d234105ac17e
Algorand ALGO

ALGO Algorand Network

WM54DHVZQIQDVTHMPOH6FEZ4U2AU3OBPGAFTHSCYWMFE7ETKCUUOYAW24Q
Binance

https://pay.binance.com/en/checkout/e92e536210fd4f62b426ea7ee65b49c3

faster-than-csv's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

faster-than-csv's Issues

Error at building docker image for benchmarking.

Describe the bug
/tmp/faster_than_csv.nim(17, 92) template/generic instantiation of exportpy from here
/tmp/faster_than_csv.nim(16, 1) template/generic instantiation of exportpyAuxAux from here
/tmp/faster_than_csv.nim(19, 15) Error: undeclared identifier: 'CsvParser'
The command '/bin/sh -c nim c -d:release --app:lib --passL:"-s" --gc:markAndSweep --passC:"-march=native" --passC:"-flto" --passC:"-ffast-math" --out:/tmp/faster_than_csv.so /tmp/faster_than_csv.nim' returned a non-zero code: 1

To Reproduce
Just build the docker.

Fix:
I don't have a fix for this yet.

faster_than_csv.csv2list() has outdated documentation

Describe the bug
The documentation for csv2list says that the function has the following signature:

csv2list(csv_file_path: string; has_header: bool = true; separator: char = ',';
             quote: char = '\"'; skipInitialSpace: bool = false; verbose: bool = false): seq[
    string] {...}

To Reproduce
Steps to reproduce the behavior:

  1. pip install faster_than_csv
  2. create a new python file and import faster_than_csv as csv
  3. Load a csv file following example : csv.csv2list("example.csv")
  4. See error : TypeError: csv2list() takes exactly 7 arguments (1 given)
  5. After reading the documentation for csv2list I noticed that only 6 arguments are in the function signature so I tried to cal : csv.csv2list("example.csv" **kwargs) with all the keyword-arguments from the documentation. This time I got the following error: TypeError: csv2list() takes exactly 7 arguments (6 given)

Expected behavior
The expected behaviour was that just a call to csv.csv2list("example.csv") should work, as all the keyword arguments have default values.
It is even more troubling csv.csv2list("example.csv" **kwargs) with all the keyword-arguments from the documentation ddes not work either.

What is the proper way to call faster_than_csv.csv2list()

Desktop (please complete the following information):

  • OS: [Windows 10, Ubuntu 20.04]
  • Version [21.03.03]

Additional context
Can you please tell me what is the proper way to call faster_than_csv.csv2list() and maybe update the documentation to make it more obvious how to use the library

Thank you in advance

@SekouDiaoNlp.

Cannot able to install faster-than-csv

Describe the bug
I'm installing it in my local machine using pip, pipenv and getting the following errors.

Expected behavior
I should be installed as the other packages I install.
Screenshots
image
image

Desktop

  • OS: Microsoft Windows 10 Home Single Language
  • Browser chrome
  • Version 10.0.19042 N/A Build 19042

No line identification?

>>> import faster_than_csv as csv
>>> csv.csv2list("sample.csv")
['1', '2', '3', '4', '10', '20', '30', '40', '100', '200', '300', '400']

Is it a planned feature, all columns in a single list, despite of rows?

Cumulative fields in json?

>>> import faster_than_csv as csv
>>> csv.csv2json("sample.csv")
['{"One":"1"}', '{"One":"1"}{"Two":"2"}', '{"One":"1"}{"Two":"2"}{"Three":"3"}', '{"One":"1"}{"Two":"2"}{"Three":"3"}{"Four":"4"}', '{"One":"10"}', '{"One":"10"}{"Two":"20"}', '{"One":"10"}{"Two":"20"}{"Three":"30"}', '{"One":"10"}{"Two":"20"}{"Three":"30"}{"Four":"40"}', '{"One":"100"}', '{"One":"100"}{"Two":"200"}', '{"One":"100"}{"Two":"200"}{"Three":"300"}', '{"One":"100"}{"Two":"200"}{"Three":"300"}{"Four":"400"}']

Is this behaviour planned? Fields "joined" cumulatively?

It doesn't run

README.md says:

If you dont understand how to install it, you can just download, extract, put the files on the same folder as your *.py file and you are good to go.

Doing that, I got the following message:

Traceback (most recent call last):
  File "csv.py", line 1, in <module>
    import faster_than_csv as csv
ImportError: dlopen(/Users/viniciusban/tmp/faster_than_csv/faster_than_csv.so, 2): no suitable image found.  Did find:
	/Users/viniciusban/tmp/faster_than_csv/faster_than_csv.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00
	/Users/viniciusban/tmp/faster_than_csv/faster_than_csv.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00

Trying with Python 3.6.5 on MacOS 10.13.6 with a fresh new virtualenv.

Feature RST

  • Convert CSV data to Restructuredtext table RST.

Pip install error - Syntax error in `faster_than_csv.nim.c`

Platform: Windows 10 (64-bit)
Python version: 3.8

pip install faster_than_csv --isolated --disable-pip-version-check --no-cache-dir --no-binary :all:
Collecting faster_than_csv
  Downloading faster_than_csv-0.9.zip (181 kB)
     |████████████████████████████████| 181 kB 3.3 MB/s
Installing collected packages: faster-than-csv
    Running setup.py install for faster-than-csv ... error
    ERROR: Command errored out with exit status 1:
     command: 'c:\users\calle\documents\github\marrmot\.venv\scripts\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\Calle\\AppData\\Local\\Temp\\pip-install-csfhbi4j\\faster-than-csv\\setup.py'"'"'; __file__='"'"'C:\\Users\\Calle\\AppData\\Local\\Temp\\pip-install-csfhbi4j\\faster-than-csv\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' --no-user-cfg install --record 'C:\Users\Calle\AppData\Local\Temp\pip-record-3q3bot33\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\calle\documents\github\marrmot\.venv\include\site\python3.8\faster-than-csv'
         cwd: C:\Users\Calle\AppData\Local\Temp\pip-install-csfhbi4j\faster-than-csv\
    Complete output (24 lines):
    running install
    running build
    running build_ext
    building 'faster_than_csv' extension
    creating build
    creating build\temp.win-amd64-3.8
    creating build\temp.win-amd64-3.8\Release
    C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -I. -Ic:\users\calle\documents\github\marrmot\.venv\include -IC:\Users\Calle\AppData\Local\Programs\Python\Python38\include -IC:\Users\Calle\AppData\Local\Programs\Python\Python38\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /Tcfaster_than_csv.nim.c /Fobuild\temp.win-amd64-3.8\Release\faster_than_csv.nim.obj -flto -ffast-math -march=native -mtune=native -O3 -fno-ident -fsingle-precision-constant
    cl : Command line warning D9002 : ignoring unknown option '-flto'
    cl : Command line warning D9002 : ignoring unknown option '-ffast-math'
    cl : Command line warning D9002 : ignoring unknown option '-march=native'
    cl : Command line warning D9002 : ignoring unknown option '-mtune=native'
    cl : Command line warning D9002 : ignoring unknown option '-O3'
    cl : Command line warning D9002 : ignoring unknown option '-fno-ident'
    cl : Command line warning D9002 : ignoring unknown option '-fsingle-precision-constant'
    faster_than_csv.nim.c
    faster_than_csv.nim.c(2985): error C2143: syntax error: missing ')' before '('
    faster_than_csv.nim.c(2985): error C2059: syntax error: ')'
    faster_than_csv.nim.c(2985): error C2146: syntax error: missing ')' before identifier 'NimMainInit'
    faster_than_csv.nim.c(2985): error C2091: function returns function
    faster_than_csv.nim.c(2985): error C2061: syntax error: identifier 'NimMainInit'
    faster_than_csv.nim.c(2985): error C2059: syntax error: ';'
    faster_than_csv.nim.c(2985): error C2059: syntax error: '<parameter-list>'
    error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\BuildTools\\VC\\Tools\\MSVC\\14.16.27023\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2
    ----------------------------------------
ERROR: Command errored out with exit status 1: 'c:\users\calle\documents\github\marrmot\.venv\scripts\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\Calle\\AppData\\Local\\Temp\\pip-install-csfhbi4j\\faster-than-csv\\setup.py'"'"'; __file__='"'"'C:\\Users\\Calle\\AppData\\Local\\Temp\\pip-install-csfhbi4j\\faster-than-csv\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' --no-user-cfg install --record 'C:\Users\Calle\AppData\Local\Temp\pip-record-3q3bot33\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\calle\documents\github\marrmot\.venv\include\site\python3.8\faster-than-csv' Check the logs for full command output.

--threads:on needed for docker build in Dockerfile line 11

Describe the bug
/nim/lib/pure/concurrency/threadpool.nim(21, 10) Error: Threadpool requires --threads:on option.
The command '/bin/sh -c nim c -d:release -d:ssl --app:lib --passL:"-s" --gc:markAndSweep --passC:"-march=native" --passC:"-flto" --passC:"-ffast-math" --out:/tmp/faster_than_requests.so /tmp/faster_than_requests.nim' returned a non-zero code: 1

To Reproduce
Just build the docker.

Fix:
Change line 11 in Dockerfile with --treads:on like this
RUN nim c -d:release -d:ssl --app:lib --passL:"-s" --gc:markAndSweep --passC:"-march=native" --passC:"-flto" --passC:"-ffast-math" --threads:on --out:/tmp/faster_than_requests.so /tmp/faster_than_requests.nim

xml json yaml?

Are you going to make support for HTML, XML, JSON, iCal, JS, XLSX, XLS, YAML, Google Spreadsheets? That would be great. Or just an additional method to convert to these formats.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.