Deep learner.
jlehrer1 / big-csv Goto Github PK
View Code? Open in Web Editor NEWA Python script to manipulate large csv/tsv files that can't fit in memory
A Python script to manipulate large csv/tsv files that can't fit in memory
It's easily solved by just doing a "pip install anndata", but might confuse non-techies.
It appears that pandas eats the first row when building the dataframe. I'm not very familiar with pandas, but I believe that's normal behavior, and how it gets the dataframe's column names.
On output, the first row is a series of integers that I'm guessing is an index into part of the dataframe's internal structure. The second row is the first column of the input. The first column of the output is actually the second row of the input.
I circumvented this with
head -1 input_file >heading_file cat heading_file input_file >temp_file
Then I did a transpose_csv on temp_file, and got the first column right. Doing a tail +1
on the transposed temp_file should get rid of the "mystery" row and produce the desired result.
If you're on Linux or MacOS, you should be able to do a little piping to automate this. If you're on Windoze ... well, you shouldn't be ;-)
Thanks for doing big-csv: I didn't know enough about pandas to know it could be used, and though I was going to have to write my own transposition code. It's a real time saver.
I'm guessing that the name was changed at some point, but the documentation wasn't updated.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.