Giter Club home page Giter Club logo

Comments (11)

johnkerl avatar johnkerl commented on May 22, 2024

@jungle-boogie there are a few options. Sample data:

cat x
4
5
$6

First is sed:

$ sed 's/\$//g' x | mlr put '$s = $1 * 2'
1=4,s=8.000000
1=5,s=10.000000
1=6,s=12.000000

Second is sub: if you are using the latest Miller release, 2.2.1, which doesn't have regex support, then

$ cat x | mlr put '$1 = sub(string($1), "$", "")' then put '$s=$1 * 2' 
1=4.000000,s=8.000000
1=5.000000,s=10.000000
1=6,s=12.000000

Or if you're using head, which has regex support, now $ has special meaning and must be escaped with backslash but otherwise the same:

$ cat x | mlr put '$1 = sub(string($1), "\$", "")' then put '$s=$1 * 2' 
1=4.000000,s=8.000000
1=5.000000,s=10.000000
1=6,s=12.000000

from miller.

jungle-boogie avatar jungle-boogie commented on May 22, 2024

Hi @johnkerl,

In my example, this is what I was hoping for:
% mlr stats1 -a sum -f 1 numbers
1_sum=3097.250000

This is the sum of all fields together where yours seems to multiply values.

I was going to use stats1 -a sum to sum together some data.

The sed approach is along the right lines:
sed 's/\$//g' x.csv | mlr --csv stats1 -a sum -f Amount
Amount_sum
12098.140000

from miller.

johnkerl avatar johnkerl commented on May 22, 2024

Yes, I used times two to keep it simple, and to emphasize that this is a data-formatting question not specific to stats, and I used a smaller sample dataset to save screen space.

Your particular example in each of the three cases above would be:

Data:

$ cat > x
800
50
30
60
46.67
1654.48
$456.1

Use sed to remove the dollar sign:

$ sed 's/\$//g' x | mlr stats1 -a sum -f 1
1_sum=3097.250000

Use mlr sub to remove the dollar sign, from 2.2.1:

$ cat x | mlr put '$1 = sub(string($1), "$", "")' then stats1 -a sum -f 1
1_sum=3097.250000

Use mlr sub to remove the dollar sign, from head:

$ cat x | mlr put '$1 = sub(string($1), "\$", "")' then stats1 -a sum -f 1 
1_sum=3097.250000

from miller.

jungle-boogie avatar jungle-boogie commented on May 22, 2024

Hi,

I think I'm getting lost with the switch possibilities.

Here's some specific data in question:
https://gist.github.com/jungle-boogie/7d9d1aec1d02588d00f5

sed works:
sed 's/\$//g' sample.csv| mlr stats1 -a sum -f 8
8_sum=1497.870000

but I have to remove the header.

% cat sample.csv | mlr --ocsv put '$8 = sub(string($8), "\$", "")' then stats1 -a sum -f 8
Couldn't parse "Amount" as number.
% cat sample.csv | mlr --ocsv put '$8 = sub(string($8), "\$", "")' then stats1 -a sum -f Amount
mlr: Couldn't find field named "Amount"

Removing the header prior to the above commands will also work using either ocsv, oxtab or none...that's just the output format.

I recommend this be considered a possible FAQ on #74

from miller.

johnkerl avatar johnkerl commented on May 22, 2024

@jungle-boogie -- please see aa666cb

from miller.

johnkerl avatar johnkerl commented on May 22, 2024

If you have CSV-formatted input then please use --icsv. If you have CSV-formatted input and you want CSV-formatted output then you could use --icsv --ocsv but --csv is a keystroke-saver for that.

$ cat sample.csv
EventOccurred,EventType,Description,Status,PaymentType,NameonAccount,TransactionNumber,Amount
10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,John,1,$230.36
10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,Fred,2,$32.25
10/1/2015,Charged Back,Reason: Customer Advises Not Authorized,Disputed,Checking,Bob,3,$39.02
10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,Alice,4,$57.54
10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,Jungle,5,$230.36
10/1/2015,Charged Back,Reason: Payment Stopped,Disputed,Checking,Joe,6,$281.96
10/2/2015,Charged Back,Reason: Customer Advises Not Authorized,Disputed,Checking,Joseph,7,$188.19
10/2/2015,Charged Back,Reason: Customer Advises Not Authorized,Disputed,Checking,Joseph,8,$188.19
10/2/2015,Charged Back,Reason: Payment Stopped,Disputed,Checking,Anthony,9,$250.00

$ mlr --icsv --opprint cat sample.csv    
EventOccurred EventType    Description                               Status   PaymentType NameonAccount TransactionNumber Amount
10/1/2015     Charged Back Reason: Authorization Revoked By Customer Disputed Checking    John          1                 $230.36
10/1/2015     Charged Back Reason: Authorization Revoked By Customer Disputed Checking    Fred          2                 $32.25
10/1/2015     Charged Back Reason: Customer Advises Not Authorized   Disputed Checking    Bob           3                 $39.02
10/1/2015     Charged Back Reason: Authorization Revoked By Customer Disputed Checking    Alice         4                 $57.54
10/1/2015     Charged Back Reason: Authorization Revoked By Customer Disputed Checking    Jungle        5                 $230.36
10/1/2015     Charged Back Reason: Payment Stopped                   Disputed Checking    Joe           6                 $281.96
10/2/2015     Charged Back Reason: Customer Advises Not Authorized   Disputed Checking    Joseph        7                 $188.19
10/2/2015     Charged Back Reason: Customer Advises Not Authorized   Disputed Checking    Joseph        8                 $188.19
10/2/2015     Charged Back Reason: Payment Stopped                   Disputed Checking    Anthony       9                 $250.00

$ cat sample.csv | mlr --csv put '$Amount = sub(string($Amount), "\$", "")' then stats1 -a sum -f Amount
Amount_sum
1497.870000

from miller.

johnkerl avatar johnkerl commented on May 22, 2024

http://johnkerl.org/miller/doc/faq.html#How_do_I_do_arithmetic_on_fields_with_currency_symbols%3f

from miller.

jungle-boogie avatar jungle-boogie commented on May 22, 2024

Hi @johnkerl,

That's perfect!

from miller.

jungle-boogie avatar jungle-boogie commented on May 22, 2024

If you have CSV-formatted input then please use --icsv.

The problem is that if I just vim sample.csv and paste in sample data above and save, then the file type is wrong:
file sample.csv
sample.csv: ASCII text

So I need to keep that in mind and use mlr --csv --rs lf when necessary.

from miller.

jungle-boogie avatar jungle-boogie commented on May 22, 2024

Here's a really long chain:
mlr --csv --ofmt '%.2lf' put '$Amount = sub(string($Amount), "\$", "")' then count-distinct -f Description,Status,PaymentType,Amount then stats1 -a sum -f Amount -g Description,Status,PaymentType sample.csv

Description,Status,PaymentType,Amount_sum
Reason: Authorization Revoked By Customer,Disputed,Checking,320.15
Reason: Customer Advises Not Authorized,Disputed,Checking,227.21
Reason: Payment Stopped,Disputed,Checking,531.96

from miller.

johnkerl avatar johnkerl commented on May 22, 2024

Re "So I need to keep that in mind and use mlr --csv --rs lf when necessary" ... agreed, it's unfortunate that RFC-CSV specifies CRLF terminators. In my daily use I generally use --csvlite since (a) it's fewer characters to type, (b) none of my data have CRLF terminators, and (c) I almost never need double-quote support.

from miller.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.