Comments (6)
@Imperatorn Great to know there's someone trying the tools on Windows! Report any problems you have, Windows related or otherwise!
from tsv-utils.
Does the merge into v2.1.2 for #320 imply that, for newline handling going forward, it might be easiest to adopt option 3?
Option 3, reading both newline forms, but writing Unix newlines
In any case, while waiting for a full windows release/build, could just csv2tsv.exe be made availalble, assuming it passes any needed tests? This would enable Windows users to generate valid tsv from csv without excel, perhaps by:
- validate foo.csv with some tool(s), for example https://csvlint.io/ But be warned, if you download the "standardized" csv they offer, I think it silently adds double-quotes around every field, including numbers. For example if foo.csv has a row
foo,22
it becomes"foo","22"
in the "standardized" csv file (not sure why). csv2tsv.exe foo.csv > foo.tsv
Note that csv2tsv by default removes double-quotes where not needed, so foo.tsv would befooTAB22
May need some documentation on how to pass escaped command-line arguments to csv2tsv.exe in windows if using cmd or powershell...
For example, on linux/macos, we can create a file with scsv (semi-colon separated values) using something like these:
csv2tsv --tsv-delim $";" foo.csv > foo.scsv
csv2tsv --tsv-delim $';' foo.csv > foo.scsv
csv2tsv --tsv-delim \; foo.csv > foo.scsv
And I'm thinking none of the above command-lines would work on windows. Perhaps ^;
would work per which-symbol-is-escape-character-in-cmd
But if you need to specify tab as a command line argument, then instead of cmd windows folks may need to use powershell, which can escape tab as backtick-t
`t
per About special characters in PowerShell docs
Finally, another work-around avoiding both cmd and powershell completely, just install git-for-windows (choco install git
or some other bash shell for windows). Then run csv2tsv.exe in that shell, if csv2tsv.exe can handle arguments passed to it from bash.exe.
from tsv-utils.
Hi @porteusconf. Thanks for the feedback and suggestions. Some comments in-line below.
Does the merge into v2.1.2 for #320 imply that, for newline handling going forward, it might be easiest to adopt option 3?
Option 3, reading both newline forms, but writing Unix newlines
Option 1, Unix newlines only on both input and output is by far the easiest (lowest investment cost). Option 3 is a fair bit more expensive. Much of this comes from increased test suite cost. Some because there are a several tools that have their own reader functionality (for example, tsv-sample
).
A relevant question is how much additional benefit would be seen investing in option 3? It's a question I don't know the answer to. How many users, how prevalent are the data files, and how onerous are the alternatives, such as invoking dos2unix
on the data first.
In any case, while waiting for a full windows release/build, could just csv2tsv.exe be made availalble, assuming it passes any needed tests? This would enable Windows users to generate valid tsv from csv without excel, ...
Well, I'm reluctant to create pre-built binary packages for only a single tool. However, I see the merit behind this idea, perhaps there are ways to get the desired effect.
First, note that nothing prevents cloning the git repo and building the tools on Windows. The test suite is not complete for Windows, but that doesn't mean the tools won't work properly. And to your point, csv2tsv
would likely passes a more complete test suite simply because the csv2tsv
test suite already includes examples of files with Windows newlines.
What could be done in this regard is to: (a) Publish test suite status info for csv2tsv
by itself; (b) Add any missing csv2tsv
tests; (c) Add specific instructions describing how to build on Windows.
perhaps by:
- validate foo.csv with some tool(s), for example https://csvlint.io/ But be warned, if you download the "standardized" csv they offer, I think it silently adds double-quotes around every field, including numbers. For example if foo.csv has a row
foo,22
it becomes"foo","22"
in the "standardized" csv file (not sure why).csv2tsv.exe foo.csv > foo.tsv
Note that csv2tsv by default removes double-quotes where not needed, so foo.tsv would befooTAB22
csv2tsv
doesn't have any trouble reading any of these formats, but as you point out, it always generates escape-free TSV.
May need some documentation on how to pass escaped command-line arguments to csv2tsv.exe in windows if using cmd or powershell...
Good thoughts, thank you.
Finally, another work-around avoiding both cmd and powershell completely, just install git-for-windows (
choco install git
or some other bash shell for windows). Then run csv2tsv.exe in that shell, if csv2tsv.exe can handle arguments passed to it from bash.exe.
Agreed, it might make sense to include this option in the documentation.
from tsv-utils.
Status?
from tsv-utils.
Status?
Status as described in the main description is up-to-date. It is updated as things change. At present, there are no known failure cases on Windows. But, since the test suite doesn't run fully, it leaves unknowns. Also, there's a lack of real-world use on Windows, or at least use that gets reported. So it is more about unknowns at this point.
Do you have specific questions?
from tsv-utils.
No, I was just wondering why there weren't any Windows binaries. I've put them here for anyone interested:
https://github.com/Imperatorn/tsv-utils/releases
from tsv-utils.
Related Issues (20)
- AUR package with LTO & PGO enabled HOT 2
- How to best use the code as a library? HOT 4
- Improve tsv-pretty lookahead logic [tsv-pretty mistake in column formatting.] HOT 8
- bufferedByLine does not work with File due to @safe <> @system conflict HOT 3
- Issue with installing on Windows 10 using D / build failure HOT 28
- tsv-summarize: Slice SummarizerBase._operators when invoking std.algorithm.each
- Inconsistent newline handling on Windows HOT 2
- Bulding tsv-utils with LTO and PGO on Archlinux HOT 14
- Homebrew install HOT 6
- Package tsv-utils for conda(-forge)? HOT 1
- No linux release assets for v2.2.1
- -bash: ./tsv-pretty: cannot execute binary file HOT 1
- Ability to produce proper CSV files
- Sort using column names
- tsv-append: limit number of rows per file? [feature request]
- Error [tsv-filter]: Not enough fields in line. File: c.tsv, Line: 1425063 HOT 1
- ENH: Add ARM64 build assets for native functionality on M1 macs (the future)
- Q: any API doc? how to skip empty field in csvReader?
- Updated benchmarks including qsv
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tsv-utils.