Giter Club home page Giter Club logo

Comments (8)

johnkerl avatar johnkerl commented on May 22, 2024

Is this your desired output?

$ cat x
a=1,b=2,key=klucz,c=3
a=4,b=4,key=klucz,c=3
a=3,b=4,key=klucz,c=1
a=0,b=0,key=klucz2,c=4
a=2,b=3,key=klucz2,c=3
a=1,b=2,key=klucz2,c=0
a=1,b=2,key=klucz,c=3
a=4,b=4,key=klucz,c=3
a=3,b=4,key=klucz,c=1

$ cat x | mlr uniq -g a,b,key,c
a=1,b=2,key=klucz,c=3
a=4,b=4,key=klucz,c=3
a=3,b=4,key=klucz,c=1
a=0,b=0,key=klucz2,c=4
a=2,b=3,key=klucz2,c=3
a=1,b=2,key=klucz2,c=0

from miller.

johnkerl avatar johnkerl commented on May 22, 2024

If so this sounds like system uniq, or sort -u, or https://github.com/johnkerl/scripts/blob/master/fundam/uniqm

from miller.

johnkerl avatar johnkerl commented on May 22, 2024

mlr uniq -g ... already does this if you type out all the column names. There could be a mlr uniq -a which does the uniqueness check on all column names without you needing to type them all out. For DKVP, no better than uniq. But for CSV, it would have added value since it would be header-aware.

from miller.

Komosa avatar Komosa commented on May 22, 2024

No, this output is not desired - I included repetition of rows in example just to show, that input doesn't have to be sorted.

Basic use case for aggr is to sum up all columns for each key. Simplest real world scenario: for given financial transactions list we want to compute summary for each month (or day, or year, ...).

from miller.

Komosa avatar Komosa commented on May 22, 2024

I can see also one additional gotcha:
some fields may be not suitable for aggregation, especially strings. It may be worth to use better strategy than just drop that fields.

I can propose following solutions for this case:

  1. use first/last value - simplest
  2. use most common (top one)

from miller.

johnkerl avatar johnkerl commented on May 22, 2024

Sorry, I didn't read your example closely enough.

Why is this different from

$ cat x
a=1,b=2,key=klucz,c=3
a=4,b=4,key=klucz,c=3
a=3,b=4,key=klucz,c=1
a=0,b=0,key=klucz2,c=4
a=2,b=3,key=klucz2,c=3
a=1,b=2,key=klucz2,c=0
a=1,b=2,key=klucz,c=3
a=4,b=4,key=klucz,c=3
a=3,b=4,key=klucz,c=1

$ cat x | mlr stats1 -a sum -f a,b,c -g key
key=klucz,a_sum=16.000000,b_sum=20.000000,c_sum=14.000000
key=klucz2,a_sum=3.000000,b_sum=5.000000,c_sum=7.000000

?

In the general case, by what criteria do I keep the first batch of three key=klucz rows distinct from the second batch of three? All six of them have the common key=klucz so mlr stats1 -g key aggregates all six of them.

from miller.

Komosa avatar Komosa commented on May 22, 2024

Oh, I totally miss -g switch to stats1..., sorry for problem.
So, in general, this feature-proposal is already implemented ;)

It is possible to use different aggregation type for different fields (other than filter results after)?
And I'm correct that -a mode works with strings?

from miller.

johnkerl avatar johnkerl commented on May 22, 2024

Correct on both: Yes, -a mode works for strings. No, at present you can't, say, only sum on one field and only max on another: if there are m aggregators and n fields then you get m*n aggregate outputs.

from miller.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.