Comments (11)
I am not sure what you propose here. It seems all you describe above works as indented.
from dataframes.jl.
Thank you for question. It seems for me that when mapcols! were conceived it aimed to work with the whole dataframe to apply the modification in-place, but what about when you want to apply a function to a bunch of columns with a specific criteria. I tried to use "view" to translate the modification to the origen, but the functions doesn't accept SubDataFrames. I am aware that 'transform!' should work just fine, but mapcols! is a great function too and fast to retrieve mentally. The goal in the example is to have the df modified directly with mapcols with the bang(!) operator even if we can do df[!,2:3] .= mapcols!(z->z.-2, df[!,2:3])
from dataframes.jl.
what about when you want to apply a function to a bunch of columns with a specific criteria
Then mapcols!
cannot work on such a case. The reason is that mapcols!
can change the number of rows in a data frame, which would render in the "subsetting" case the result to be a corrupted data frame. Sa you commented - use transform!
in these more complex cases.
Example of mapcols!
resizing a data frame:
julia> df = DataFrame(a=1:4, b=11:14)
4×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 11
2 │ 2 12
3 │ 3 13
4 │ 4 14
julia> mapcols!(x -> x[2:end-1], df)
2×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 2 12
2 │ 3 13
from dataframes.jl.
Thanks for the explanation. In such situation, I'd expect that reduction of the data frames as part of the modification, but I guess that would be dangerous for the minilanguage and could break elsewhere. Thanks!
from dataframes.jl.
"Not feasible for integrity"
from dataframes.jl.
I'd expect that reduction of the data frames as part of the modification
Yes - assume that sub data frame has columns "a" and "b" but you also have columns "c" and "d". If you shorten columns "a" and "b" it is not clear what to do with columns "c" and "d".
from dataframes.jl.
I 'd say that whatever dimensional reduction you do of a part of the data frame should be translated to the rest . I mean whatever index reduction "a" and "b" suffer, "c" and "d" would respect whatever index is remaining after the transformation.
from dataframes.jl.
whatever index is remaining after the transformation.
But this is impossible to determine. Again - assume original data frame has 10 rows and 4 columns. Assume that I mapcols!
columns 1 and 2 and resize them to 2 rows. Which of the 10 rows in columns 3 and 4 is unclear: it could be first 2, last 2, or whatever other pair.
from dataframes.jl.
Thanks for the follow-up. When you create a view you are pointing to a particular index in the parent. And if you do another slicing on the view with mapcols, the result should be the two reduction in the parent. Let's see
julia> df = DataFrame([1:6;; 7:12;;13:18;; 'a':'f';; 'f':'k';; ones(6)],string.('a':'f'))
6×6 DataFrame
Row │ a b c d e f
│ Any Any Any Any Any Any
─────┼──────────────────────────────
1 │ 1 7 13 a f 1.0
2 │ 2 8 14 b g 1.0
3 │ 3 9 15 c h 1.0
4 │ 4 10 16 d i 1.0
5 │ 5 11 17 e j 1.0
6 │ 6 12 18 f k 1.0
julia> vdf = @view df[3:end, 2:4]
4×3 SubDataFrame
Row │ b c d
│ Any Any Any
─────┼───────────────
1 │ 9 15 c
2 │ 10 16 d
3 │ 11 17 e
4 │ 12 18 f
#let's emulate the view with data frame
julia> dft = df[3:end, 2:4]
4×3 DataFrame
Row │ b c d
│ Any Any Any
─────┼───────────────
1 │ 9 15 c
2 │ 10 16 d
3 │ 11 17 e
4 │ 12 18 f
## the modification of the view ( this would damage the view: outbounding)
julia> mapcols( x->x[3:4],dft)
2×3 DataFrame
Row │ b c d
│ Any Any Any
─────┼───────────────
1 │ 11 17 e
2 │ 12 18 f
#since here, it should be programmatically (method) but I am doing it manually to illustrate the output.
ndf=DataFrame(a=[],b=[],c=[],d=[],e=[],f=[])
##expected modification of the original DataFrame if mapcols! accepts subdataframe
julia> append!(ndf , [df[5:6,[:a]] mapcols( x->x[3:4],dft) df[5:6,[:e,:f]]]) ## see the remaining original indexing after slicing
2×6 DataFrame
Row │ a b c d e f
│ Any Any Any Any Any Any
─────┼──────────────────────────────
1 │ 5 11 17 e j 1.0
2 │ 6 12 18 f k 1.0
Now, for doing this I'd guess there must be a tracking of the modified indexes. But what troubles me now is when you use an aggregating function and the only thing I came out with was that mapcols applies a group by for the columns it is not changing. I know that all this slicing can damage the view, but the analyst should be aware of this. I hope these thoughts help in some way.
from dataframes.jl.
There are two issues with what you propose:
- why in the output you assume that the filtered-out rows in sub-data frame should be dropped (but this is a design issue, and could be discussed - I would assume they should not be dropped)
- Why you kept rows 5 and 6 from the original data frame (in the columns not in the view) - how could Julia know which rows to keep? To maybe better show the issue. Assume that
mapcols!
increases the number of rows, eg. the function modifying the data werex -> ones(100)
, what should then happen with the filtered-out columns?
from dataframes.jl.
for the first point, you are right, I am assuming it , but that could be an option ( something like materialize = true
), on the contrary, we filter only the relative position in the slicing in mapcols to the parent, let s see the output:
4×6 DataFrame
Row │ a b c d e f
│ Any Any Any Any Any Any
─────┼──────────────────────────────
1 │ 1 7 13 a f 1.0
2 │ 2 8 14 b g 1.0
3 │ 5 11 17 e j 1.0
4 │ 6 12 18 f k 1.0
Now for the expansion , with that function, you would be substituting the values and expanding, and when you are out of your boundaries , you should have missing for the rest of the columns that are not in the view:
Row │ a b c d e f
│ Any Any Any Any Any Any
─────┼──────────────────────────────────────────
1 │ 1 7 13 a f 1.0
2 │ 2 8 14 b g 1.0
3 │ 3 1 1 1 h 1.0
4 │ 4 1 1 1 i 1.0
5 │ 5 1 1 1 j 1.0
6 │ 6 1 1 1 k 1.0
7 │ missing 1 1 1 missing missing
8 │ missing 1 1 1 missing missing
you can have a keyword for filling.
from dataframes.jl.
Related Issues (20)
- Add rename!(::DataFrame, ::Pair{Regex, SubstitutionString}) method
- GroupBy then combine changes column order HOT 2
- Inconsistent Mean Calculation in Grouped DataFrame Compared to Overall DataFrame HOT 2
- What is the best way to write large DataFrames efficiently and with high performance in Julia while minimizing memory usage? HOT 4
- Segmentation Fault when reading compressed file HOT 1
- Revisit spreading for `AsTable` output` HOT 6
- Better error message when forming a DataFrame from a vector of dictionaries with missing data. HOT 2
- `describe` is slow HOT 3
- CartesianIndex error in Julia 1.11 HOT 4
- `DataFrame(x=Int[], y=Int)` HOT 3
- Add comparison function for dataframes which can handle both isapprox and isequal column types HOT 2
- unique fails with column-type FixedDecimal HOT 5
- Feature request: Pairs in stack HOT 2
- Grouped DataFrame with array elements fails to combine HOT 4
- error when combining a grouped empty dataframe using `first` HOT 6
- Short circuit && on subset? HOT 1
- Integer strings as colnames/selectors are error prone HOT 2
- Suggestion - Matrix Syntax for hcat (as well as vcat) HOT 4
- Document custom generation of column names in manual HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dataframes.jl.