Comments (11)
@jungle-boogie there are a few options. Sample data:
cat x
4
5
$6
First is sed
:
$ sed 's/\$//g' x | mlr put '$s = $1 * 2'
1=4,s=8.000000
1=5,s=10.000000
1=6,s=12.000000
Second is sub
: if you are using the latest Miller release, 2.2.1, which doesn't have regex support, then
$ cat x | mlr put '$1 = sub(string($1), "$", "")' then put '$s=$1 * 2'
1=4.000000,s=8.000000
1=5.000000,s=10.000000
1=6,s=12.000000
Or if you're using head, which has regex support, now $
has special meaning and must be escaped with backslash but otherwise the same:
$ cat x | mlr put '$1 = sub(string($1), "\$", "")' then put '$s=$1 * 2'
1=4.000000,s=8.000000
1=5.000000,s=10.000000
1=6,s=12.000000
from miller.
Hi @johnkerl,
In my example, this is what I was hoping for:
% mlr stats1 -a sum -f 1 numbers
1_sum=3097.250000
This is the sum of all fields together where yours seems to multiply values.
I was going to use stats1 -a sum to sum together some data.
The sed approach is along the right lines:
sed 's/\$//g' x.csv | mlr --csv stats1 -a sum -f Amount
Amount_sum
12098.140000
from miller.
Yes, I used times two to keep it simple, and to emphasize that this is a data-formatting question not specific to stats, and I used a smaller sample dataset to save screen space.
Your particular example in each of the three cases above would be:
Data:
$ cat > x
800
50
30
60
46.67
1654.48
$456.1
Use sed
to remove the dollar sign:
$ sed 's/\$//g' x | mlr stats1 -a sum -f 1
1_sum=3097.250000
Use mlr sub
to remove the dollar sign, from 2.2.1:
$ cat x | mlr put '$1 = sub(string($1), "$", "")' then stats1 -a sum -f 1
1_sum=3097.250000
Use mlr sub
to remove the dollar sign, from head:
$ cat x | mlr put '$1 = sub(string($1), "\$", "")' then stats1 -a sum -f 1
1_sum=3097.250000
from miller.
Hi,
I think I'm getting lost with the switch possibilities.
Here's some specific data in question:
https://gist.github.com/jungle-boogie/7d9d1aec1d02588d00f5
sed works:
sed 's/\$//g' sample.csv| mlr stats1 -a sum -f 8
8_sum=1497.870000
but I have to remove the header.
% cat sample.csv | mlr --ocsv put '$8 = sub(string($8), "\$", "")' then stats1 -a sum -f 8
Couldn't parse "Amount" as number.
% cat sample.csv | mlr --ocsv put '$8 = sub(string($8), "\$", "")' then stats1 -a sum -f Amount
mlr: Couldn't find field named "Amount"
Removing the header prior to the above commands will also work using either ocsv, oxtab or none...that's just the output format.
I recommend this be considered a possible FAQ on #74
from miller.
@jungle-boogie -- please see aa666cb
from miller.
If you have CSV-formatted input then please use --icsv
. If you have CSV-formatted input and you want CSV-formatted output then you could use --icsv --ocsv
but --csv
is a keystroke-saver for that.
$ cat sample.csv
EventOccurred,EventType,Description,Status,PaymentType,NameonAccount,TransactionNumber,Amount
10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,John,1,$230.36
10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,Fred,2,$32.25
10/1/2015,Charged Back,Reason: Customer Advises Not Authorized,Disputed,Checking,Bob,3,$39.02
10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,Alice,4,$57.54
10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,Jungle,5,$230.36
10/1/2015,Charged Back,Reason: Payment Stopped,Disputed,Checking,Joe,6,$281.96
10/2/2015,Charged Back,Reason: Customer Advises Not Authorized,Disputed,Checking,Joseph,7,$188.19
10/2/2015,Charged Back,Reason: Customer Advises Not Authorized,Disputed,Checking,Joseph,8,$188.19
10/2/2015,Charged Back,Reason: Payment Stopped,Disputed,Checking,Anthony,9,$250.00
$ mlr --icsv --opprint cat sample.csv
EventOccurred EventType Description Status PaymentType NameonAccount TransactionNumber Amount
10/1/2015 Charged Back Reason: Authorization Revoked By Customer Disputed Checking John 1 $230.36
10/1/2015 Charged Back Reason: Authorization Revoked By Customer Disputed Checking Fred 2 $32.25
10/1/2015 Charged Back Reason: Customer Advises Not Authorized Disputed Checking Bob 3 $39.02
10/1/2015 Charged Back Reason: Authorization Revoked By Customer Disputed Checking Alice 4 $57.54
10/1/2015 Charged Back Reason: Authorization Revoked By Customer Disputed Checking Jungle 5 $230.36
10/1/2015 Charged Back Reason: Payment Stopped Disputed Checking Joe 6 $281.96
10/2/2015 Charged Back Reason: Customer Advises Not Authorized Disputed Checking Joseph 7 $188.19
10/2/2015 Charged Back Reason: Customer Advises Not Authorized Disputed Checking Joseph 8 $188.19
10/2/2015 Charged Back Reason: Payment Stopped Disputed Checking Anthony 9 $250.00
$ cat sample.csv | mlr --csv put '$Amount = sub(string($Amount), "\$", "")' then stats1 -a sum -f Amount
Amount_sum
1497.870000
from miller.
http://johnkerl.org/miller/doc/faq.html#How_do_I_do_arithmetic_on_fields_with_currency_symbols%3f
from miller.
Hi @johnkerl,
That's perfect!
from miller.
If you have CSV-formatted input then please use --icsv.
The problem is that if I just vim sample.csv
and paste in sample data above and save, then the file type is wrong:
file sample.csv
sample.csv: ASCII text
So I need to keep that in mind and use mlr --csv --rs lf
when necessary.
from miller.
Here's a really long chain:
mlr --csv --ofmt '%.2lf' put '$Amount = sub(string($Amount), "\$", "")' then count-distinct -f Description,Status,PaymentType,Amount then stats1 -a sum -f Amount -g Description,Status,PaymentType sample.csv
Description,Status,PaymentType,Amount_sum
Reason: Authorization Revoked By Customer,Disputed,Checking,320.15
Reason: Customer Advises Not Authorized,Disputed,Checking,227.21
Reason: Payment Stopped,Disputed,Checking,531.96
from miller.
Re "So I need to keep that in mind and use mlr --csv --rs lf
when necessary" ... agreed, it's unfortunate that RFC-CSV specifies CRLF terminators. In my daily use I generally use --csvlite
since (a) it's fewer characters to type, (b) none of my data have CRLF terminators, and (c) I almost never need double-quote support.
from miller.
Related Issues (20)
- Read performance can be improved for high-column-count data
- Investigate shutdown latency on `mlr head` HOT 2
- Cryptic fatal error message for nonexistent files since 6.9.0 HOT 2
- Investigate spurious `[]` on JSON output in some cases HOT 4
- `flatten` not working on csv input data
- Bash process substitution not working with `put -f`
- Miller's `strptime` accepts fewer format options than `strptime`
- Inconsistent result when using `$*`
- Double-width characters spoil column alignment HOT 4
- `mlr --icsv --ojson cat < mlr.bug.csv` drops some columns HOT 5
- Add description for "put" verb HOT 1
- 'mlr cut' is very slow HOT 8
- mlr --otsv does not handle broken quotes correctly compared to --ocsv HOT 6
- JSON to CSV Error HOT 8
- exit code = 1 for --csv skip-trivial-records and csv file's last record is blank
- Automated way of clearing down column data HOT 7
- JSON flag documentation question HOT 1
- Equivalent to Excel function "data load from folder" then "combine and load" multiple CSV's finally "apply transformations" HOT 7
- Find and remove "string" retaining all other row data HOT 5
- Find and replace special character & with and using ssub HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from miller.