Comments (8)
Is this your desired output?
$ cat x
a=1,b=2,key=klucz,c=3
a=4,b=4,key=klucz,c=3
a=3,b=4,key=klucz,c=1
a=0,b=0,key=klucz2,c=4
a=2,b=3,key=klucz2,c=3
a=1,b=2,key=klucz2,c=0
a=1,b=2,key=klucz,c=3
a=4,b=4,key=klucz,c=3
a=3,b=4,key=klucz,c=1
$ cat x | mlr uniq -g a,b,key,c
a=1,b=2,key=klucz,c=3
a=4,b=4,key=klucz,c=3
a=3,b=4,key=klucz,c=1
a=0,b=0,key=klucz2,c=4
a=2,b=3,key=klucz2,c=3
a=1,b=2,key=klucz2,c=0
from miller.
If so this sounds like system uniq
, or sort -u
, or https://github.com/johnkerl/scripts/blob/master/fundam/uniqm
from miller.
mlr uniq -g ...
already does this if you type out all the column names. There could be a mlr uniq -a
which does the uniqueness check on all column names without you needing to type them all out. For DKVP, no better than uniq
. But for CSV, it would have added value since it would be header-aware.
from miller.
No, this output is not desired - I included repetition of rows in example just to show, that input doesn't have to be sorted.
Basic use case for aggr
is to sum up all columns for each key. Simplest real world scenario: for given financial transactions list we want to compute summary for each month (or day, or year, ...).
from miller.
I can see also one additional gotcha:
some fields may be not suitable for aggregation, especially strings. It may be worth to use better strategy than just drop that fields.
I can propose following solutions for this case:
- use first/last value - simplest
- use most common (top one)
from miller.
Sorry, I didn't read your example closely enough.
Why is this different from
$ cat x
a=1,b=2,key=klucz,c=3
a=4,b=4,key=klucz,c=3
a=3,b=4,key=klucz,c=1
a=0,b=0,key=klucz2,c=4
a=2,b=3,key=klucz2,c=3
a=1,b=2,key=klucz2,c=0
a=1,b=2,key=klucz,c=3
a=4,b=4,key=klucz,c=3
a=3,b=4,key=klucz,c=1
$ cat x | mlr stats1 -a sum -f a,b,c -g key
key=klucz,a_sum=16.000000,b_sum=20.000000,c_sum=14.000000
key=klucz2,a_sum=3.000000,b_sum=5.000000,c_sum=7.000000
?
In the general case, by what criteria do I keep the first batch of three key=klucz
rows distinct from the second batch of three? All six of them have the common key=klucz
so mlr stats1 -g key
aggregates all six of them.
from miller.
Oh, I totally miss -g
switch to stats1
..., sorry for problem.
So, in general, this feature-proposal is already implemented ;)
It is possible to use different aggregation type for different fields (other than filter results after)?
And I'm correct that -a mode
works with strings?
from miller.
Correct on both: Yes, -a mode
works for strings. No, at present you can't, say, only sum on one field and only max on another: if there are m aggregators and n fields then you get m*n aggregate outputs.
from miller.
Related Issues (20)
- clean-whitespace verb has no effect when followed by put verb HOT 13
- mlr: record has too many input fields named 'y' HOT 2
- Support markdown format on input HOT 1
- system: not able to escape single quote HOT 12
- Add support for thousands separator in `fmtnum` HOT 2
- Question related to unsparsify HOT 6
- CSV header/data length mismatch 5 != 3 on row that does not exist HOT 2
- Is there a way to "sparsify" HOT 3
- [feature request] Split by file size
- Miller produces no output on TSV with > 64K characters per line HOT 11
- [feature request] Right-align numeric values in PPRINT and Markdown output formats
- Read performance can be improved for high-column-count data
- Investigate shutdown latency on `mlr head` HOT 2
- Cryptic fatal error message for nonexistent files since 6.9.0 HOT 2
- Investigate spurious `[]` on JSON output in some cases HOT 4
- `flatten` not working on csv input data
- Bash process substitution not working with `put -f`
- Miller's `strptime` accepts fewer format options than `strptime`
- Inconsistent result when using `$*`
- Double-width characters spoil column alignment HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from miller.