aotimme / gocsv Goto Github PK
View Code? Open in Web Editor NEWCommand-line CSV processing utility.
License: MIT License
Command-line CSV processing utility.
License: MIT License
The bigger logic of Cap (implicitly) allows for the user to leave the --names flag empty and specify only --default-name:
If names happens to be an empty slice, then numNames is 0. The first half of the predicate is true, but if defaultName is not empty then Cap doesn't error-out:
Lines 49 to 52 in f9d4372
Then... with nunNames being 0, the code flows through the else statement and builds the header exclusively off of defaultName:
Lines 60 to 71 in f9d4372
And I would like this behavior: if I have a headerless CSV with any number of columns that I want to feed into another GoCSV command I want to easily cap it with --default-name=Col
.
But, prior to that, Cap tries to create the slice of strings, names, based on the assumption that the --names flag is not an empty string:
Lines 37 to 40 in f9d4372
The call to GetArrayFromCsvString(...)
will panic if the passed string is empty.
I propose:
which isn't spelled out in the documentation; took me a minute to figure out why this wasn't working.
A sample CSV:
% cat test.csv
1a
2a
3a
Naive attempt:
% gocsv add -t 'foo' test.csv
1a,
2a,foo
3a,foo
cap then add:
% gocsv cap -names 'C1' test.csv | gocsv add -t 'foo'
C1,
1a,foo
2a,foo
3a,foo
that works, so finally:
% gocsv cap -names 'C1' test.csv | gocsv add -t 'foo' | gocsv behead
1a,foo
2a,foo
3a,foo
So, maybe spell out that add assumes a header, and to pipe from cap
if your data doesn't already have one?
Hello there, I have a csv file with "\x01" as delimiter and I want to use gocsv to change it to others like "," or "\t".
I have tried head input | gocsv delim -i "\x01" -o "\t" > output
but nothing changed.
What should I do?
Hi,
I am trying to install gocsv
on my Ubuntu 14.04
Instance but experiencing permission issues.
I was exectuting the following command.
/bin/bash <(curl -s https://raw.githubusercontent.com/DataFoxCo/gocsv/latest/scripts/install-latest-darwin-amd64.sh)
That means it is trying to write in usr/local/bin/gocsv
What I did is that I created a folder gocsv
in usr/local/bin
then I changed the permission as below:
sudo mkdir -p /usr/local/bin/gocsv
sudo chmod -R 777 gocsv
/bin/bash <(curl -s https://raw.githubusercontent.com/DataFoxCo/gocsv/latest/scripts/install-latest-darwin-amd64.sh)
sudo chown -R mtaziz:mtaziz gocsv
Following above, It has been installed successfully but when I run gocsv help
, says Permission denied.
Any help in this regard highly appreciated.
Thank you.
I'm in a situation where I want to take an existing CSV and add a few empty columns with specific names for it. The CSV is an inventory of website URLs, and I want to add some columns to it like "Up To Date", "Reviewed", "Delete" etc for an audit spreadsheet. I am currently accomplishing this with --template like so:
gocsv template -t " " --name "Reviewed"
but it seems a little hacky to have to specify a value there, when all I really want is a new empty column with a specific name. It would be nice to be able to do something like
gocsv add --name "Reviewed"
and be done with it.
Thanks for a great tool!
If I have a csv file such as:
Test file: test.csv
cat test.csv
ContactName,EmailAddress
Run the following:
../gocsv select --columns ContactName test.csv
panic: Could not find header "ContactName"
goroutine 1 [running]:
main.GetIndicesForColumnsOrPanic(0xc42000a240, 0x2, 0x2, 0xc420062550, 0x1, 0x1, 0xffffffffffffffff, 0x1, 0x1)
/Users/alden/gocsv/src/utils.go:18 +0xc1
main.SelectColumns(0x5b4800, 0xc42000a220, 0xc420062550, 0x1, 0x1)
/Users/alden/gocsv/src/select.go:106 +0x18d
main.(*SelectSubcommand).Run(0xc42000a160, 0xc42000e180, 0x1, 0x1)
/Users/alden/gocsv/src/select.go:43 +0x133
main.main()
/Users/alden/gocsv/src/gocsv.go:96 +0x22d
As can be seen the select cannot find the first column.
If I change test.csv to:
cat test.csv
fred,ContactName,EmailAddress
Then run
../gocsv select --columns ContactName test.csv
ContactName
Now that ContactName is the second column the select works.
The program jq
offers an option to output the raw value which is really useful in bash scripting, it would be nice of gocsv also had this to output a column without CSV formatting (quotes and new lines as \n
) to allow querying and extracting data.
Every error produced by gocsv also dumps a stack trace to the console.
The stack trace is great for debugging but annoying when using the tool and making simple mistakes. It makes it hard to see the actual error message.
It would be nice if the stack trace was suppressed unless a switch (--debug) was added to the command line.
Hi,
Can I suggest an improvement to the release practice? Currently, the only release (described as 'Latest Release') dates from 2016. The documented way to make a release is to edit this release and update the binaries, which I think is not ideal. It is not clear how outdated the binaries actually are, which already caused confusion. Not to mention older releases are lost.
If we make new tagged releases from time to time, it should still be possible to refer to the latest release, e.g. in the macOS install script, as this page details.
Although I don't have permission to actually make a release, I'd be happy to submit a pull request with some changes. Here's where I think updates might be required:
Hello,
First, many thanks for this great tool!
I face an issue using sql subcommand and column names that contain square brackets:
# echo -e "id,f[0],f[1]\n1,2,3" | gocsv sql -q "select * from stdin" -
# Error: unrecognized token: "]"
Is there a solution?
When a CSV file has an escaped quote, gocsv returns a parse error.
$ cat sample.csv | gocsv filter -c 2 -eq "af4ba48d_wp[af4ba48d_wp] @ localhost []" | gocsv filter -c 5 -eq "Query" | gocsv select -c 6
Error: parse error on line 3, column 140: extraneous or missing " in quoted-field
SELECT option_value FROM wp_options WHERE option_name = 'disabled_hit_count' LIMIT 1
SHOW FULL COLUMNS FROM `wp_options`
I'm not sure if quote escapes like this are something you would want to handle, as the standard way to escape quotes in CSV is to just repeat the quote twice like ""
as is done in excel. This CSV could easily be translated to that with sed 's/\\\\\\"/""/g'
, but I thought I'd mention this if it is something you see fit to handle as the MySQL CSV engine seems to escape quotes in that manner.
I forgot to pass the column switch to the join and got an index out of range.
It should state that a required switch is missing.
gocsv join --left services_with_id.csv iahproducts.csv
panic: runtime error: index out of range
goroutine 1 [running]:
main.GetArrayFromCsvString(0x0, 0x0, 0xc42000a132, 0x594eca, 0x5)
/Users/alden/gocsv/src/utils.go:147 +0x254
main.(*JoinSubcommand).Run(0xc42000a120, 0xc42000e170, 0x2, 0x2)
/Users/alden/gocsv/src/join.go:51 +0x75
main.main()
/Users/alden/gocsv/src/gocsv.go:96 +0x22d
Would like to be able to specify how many rows to skip before performing a task, for situations where an export I am editing has more than 1 header.
I've downloaded gocsv-windows-4.0-amd64.zip
and when I extract gocsv.exe, it triggers Sophos' malware detection.
The malware in question is identified as CXrep/MalGo-A. Is there an alternative?
when using the stack
subcommand - if file headers on any of the subsequent files (after the 1st) dont match up - to print a message with the filename that is failing.
Provide a switch that allows the input of filenames to be directory based. If a directory has more files than is allowed on the command line (when using * for example) the command fails (as expected).
e.g. this fails
gocsv stack /my/dir/of/10M_files/*.csv
So the only way to get round this is to:
find /my/dir/of/10M_files/ -name "*.csv" -exec gocsv stack "{}" ";"
go get github.com/DataFoxCo/gocsv
now results in:
src/github.com/DataFoxCo/gocsv/cmd/xlsx.go:105:27: sheet.Rows undefined (type *xlsx.Sheet has no field or method Rows)
Cannot convert file from stdin
GoCSV is currently using Sprig v3.1.0. The latest Sprig is v3.2.1, which notably has math functions for floats (3.1.0 only has integer math functions).
When using gocsvs split
the output file names have suffix such as -1.csv, etc. so if there are more than 9 the filenames don't sort properly. Would be nice to be able to optionally zero pad the number in the suffix with sufficient zeroes that they sort properly.
I work almost exclusively with semicolon-delimited CSVs and it's a bit frustrating to always have to use gocsv delim
before and after the actual commands. In some cases, like joins, gocsv delim
isn't really an option even. Would you consider supporting setting an alternative default delimiter, perhaps via an environment variable? Would you review a pull request that would add this feature?
Since I needed this feature quickly, I set the default delimiter to semicolon in a fork, but it would of course be nicer to make it configurable in the main repository. I think tabs are also very common as delimiter and that would be another use case.
The join syntax doesn't seem to quite match the sql syntax.
When I 'left' join a csv file I'm expecting to only get rows in the left table if and only if they match a row in the right hand table.
In fact I get every row in the left table regardless of whether they are in the right table or not.
This perhaps seems like it might make sense for csv files, but then what does the 'outer' join method do?
When converting an XLSX to CSV, if there are 5 columns, you'd expect the output to be:
column_01,column_02,column_03,column_04,column_05
text,text,text,text,text
However if the last column is blank you get this:
text,text,text,text
It is missing that last comma delimiter and running other operations like -select will error at that row due to incorrect number of columns found.
How to use unique then save as a csv name?
I'm using the --regex and --repl args.
The problem is that I need to preserve the original column.
what I really want to do is output the results of the replace into a new column.
I use gocsv join subcommand with 2 csv files, but only for one column condition.
If more than one column to be join with, it failed.
How to join 2 csv files with multiple columns condition?
The following function has two issues:
Lines 175 to 184 in 5924c92
Sometimes I want to squeeze as many columns onto the screen as it will fit and I try gocsv view -w 1 file.csv
, like:
go run main.go view -w 1 << EOF
Col1-really-long-name,Col2-really-long-name
1,2
3,4
EOF
I naively expect something like:
+---+---+
| C | C |
+---+---+
| 1 | 2 |
+---+---+
| 3 | 4 |
+---+---+
Instead GoCSV panics:
panic: runtime error: slice bounds out of range [:-2]
...
getTruncatedLine assumes that width is greater-than-or-equal-to 3 when it tries to truncate an extra 3 chars (but not runes) to make room for the ellipsis, return line[:width-3] + "..."
, leading to a negative high bound for the re-slice operation.
I think if we want to keep the ellipsis, then view's Run function should guard against a width of 1 or 2:
Lines 34 to 38 in 5924c92
and the documentation should be updated to say --max-width must be a minimum of 3 (since the user would probably be confused or disappointed that 0 did nothing).
Those fixes won't actually give me the expected I shared, but I understand that it's not possible without other changes, so what I actually want is not a part of this issue.
As for "(but not runes)" in the explanation above...
go run main.go view -w 15 << EOF
"Foobarbaz 日本のルーン",Col2-really-long-name
1,2
3,4
EOF
I expect:
+-----------------+-----------------+
| Foobarbaz 日本... | Col2-really-... |
+-----------------+-----------------+
| 1 | 2 |
+-----------------+-----------------+
| 3 | 4 |
+-----------------+-----------------+
Instead:
+-----------------+-----------------+
| Foobarbaz �... | Col2-really-... |
+-----------------+-----------------+
| 1 | 2 |
+-----------------+-----------------+
| 3 | 4 |
+-----------------+-----------------+
While getTruncatedLine correctly checks for length with utf8.RuneCountInString, when it comes to truncating it's operating on the (UTF-8-encoded) string, so its slice bounds are off, return line[:width-3]
.
line could be converted to a slice of runes, re-sliced to width, then converted back to string.
I've been using left/right outer joins and need to identify rows where the join didn't match.
I can do this with a filter and a regex but that's a bit painful.
It would be nice to have a filter that explicitly matches a blank/empty column.
e.g.
gocsv filter --columns a,b --empty
I think the semantics of empty should evaluate to true even if the field contains spaces.
It took me a little while to work out what was going wrong and this may be a little hard to fix.
Essentially I habitually add a space after a comma.
The result for me was that when specifying columns to the --column switch I was getting an error 'too many files'.
I eventually worked out that the --column switch looks for the first space to determine the end of the column list.
This isn't documented and the resulting error was non-obvious.
This either needs to be explicitly documented or the command line changed to deal with spaces.
It looks like change the command line parsing could only be done in a 'non backward compatible' method so it might be better to just highlight this fact in the doco.
I use ON WINDOW10
gocsv sql --query "SELECT column FROM sample" sample.csv
but in vain.
The CMD window show
Error:Binary was complied with 'CGO_ENABLED=0' go-sqlite3 requires cgo to work. this is a stub
It would be great if gocsv could convert csv input into json output :-)
The doco doesn't provide any description on how you stipulate a column name when a column names appears in multiple csv files or even if the same column name appears twice in a single file.
For instance I'm 'joining' two files that have a common column name.
The contents of these two columns may be different.
I am looking to run a filter after 'joining' the two files but can't work out how to stipulate the second of the two column names to apply the filter to.
Firstly, thanks for this tool - I like it, very useful. One question: the clean
option introduces a new empty line at the end of the file.
input:
A,B,,,
0,0.8570,,,
499,0.8570,,,
999,0.9021,,,
1499,0.9498,,,
1999,1.0000,,,
2499,1.0527,,,
2999,1.0528,,,
3499,1.0528,,,
becomes:
A,B
0,0.8570
499,0.8570
999,0.9021
1499,0.9498
1999,1.0000
2499,1.0527
2999,1.0528
3499,1.0528
Is this intentional and by design?
The fix for #54 changes the delim subcommand's default behavior concerning the input and output delimiters.
The subcommand used to do a run-time check if the parsed rune for either in or out was not the zero value, and if not set either to the rune. If the rune was the zero value, the csv.Reader's default Comma value (,
) was used.
#54 changed the delimiter parsing to fail if no delimiter was explicitly set and error-out immediately: the csv.Reader's default Comma value no longer matters.
Now, the delim subcommand requires both in and out delimiters being explicitly set. That breaks at least one existing alias/script for me. I'd like delim to return to having a default behavior that just works. Setting the default values for the input and output flags seems like a sensible correction to me:
Thank you for developing this, it seems to be a very useful tool. However, when converting a xlsx
file to csv
with gocsv xlsx
, it would be great to have the option to specify the encoding of the output `csv file.
Also, when doing a batch conversion of multiple files, it would be nice to be able to specify the destination directory for the output files instead of creating a directory for the CSVs.
Feature request: It would be amazing to be able to perform the following analysis with gocsv - unless of course I've missed something!
https://www.fireeye.com/blog/threat-research/2012/11/indepth-data-stacking.html
Would you be open to having gocsv be buildable and installable with go get
?
I would like to request the ability to run basic regex transformation on a column.
Something along the format of:
--regexreplace "regex string to match" "replacement regex"
For example there are a few main way I generally use my regexes - 2 of which use capture groups:
Ex. 1 : I have a column that has only datafox urls and I would like, for any cell that matches the regex, to only give me the id or slug at the end:
--regexreplace "^http://datafox.com/.*/" ""
Replacement with an empty string
--regexreplace "(http://datafox.com/.*/)(\w{24})" "$2"
Replacement with the second capture group
--regexreplace "(last name), (first name)" "$2 $1"
Formatting but swapping capture groups and adding a space in between - where all characters are literal except for the capture group # reference
--regexreplace ".*@gmail.com|.*@aol.com|.*@hotmail.com" ""
Another example of replacement with empty string so that only cells without those matches would remain
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.