Comments (5)
I am not sure I agree. Wouldn’t that be the same as a DF.filter before hand? In any case, we should at least improve the error message. :)
from explorer.
In this case it'd be the same, but mine is just a minimal example. The original example from elixirforum isn't equivalent.
I don't see why we shouldn't support it. But if we can't for some reason, then definitely an improved error message is the way to go.
from explorer.
I am not sure I agree. Wouldn’t that be the same as a DF.filter before hand? In any case, we should at least improve the error message. :)
The group_by makes DF.filter not entirely viable without backfilling some column values after the fact.
For example, currently our approach looks like this. in the future, we will also have 3 more of these aggregations
I have to get the distinct values of the sim_idx
to use in a join later, so that we can backfil any of that group that the drop_nil
removes entirely.
I believe that filtering a series inside summarise would make that
really what i want to do for each column of interest inside the group is "give me the first not nil value or if the series only has nil, then 'none'."
sim_idx = data_frame |> DataFrame.distinct([:sim_idx])
data_frame =
any_data_frame
|> DataFrame.mutate(
any_id:
if result in ["one", "two", "three", "four"] do
person_id
else
nil
end
)
|> DataFrame.drop_nil([:any_id])
|> DataFrame.group_by(["sim_idx"])
|> DataFrame.summarise(any: first(any_id))
|> DataFrame.join(sim_idx, on: [:sim_idx], how: :right)
two_data_frame =
data_frame
|> DataFrame.mutate(
two_id:
if result == "two" do
person_id
else
nil
end
)
|> DataFrame.drop_nil([:two_id])
|> DataFrame.group_by(["sim_idx"])
|> DataFrame.summarise(two: first(two_id))
|> DataFrame.join(sim_idx, on: [:sim_idx], how: :right)
DataFrame.join(any_data_frame, two_data_frame, on: [:sim_idx])
|> DataFrame.mutate(
any: fill_missing(any, "none"),
two: fill_missing(two, "none")
)
I might be misunderstanding, but the dplyr docs seems to imply that their API can do grouped filtering: https://dplyr.tidyverse.org/articles/grouping.html?q=summ#filter
from explorer.
but, as I send that, I see that DF.filter works with groups... which is what i think Jose was saying.
let me try that out 🤦
from explorer.
Yeah so that method can work, but seems like my previous workaround just rearranged.
I think the key thing that that the call to DF.summarise after the call to DF.filter will not summarise any grouped values if they were filtered out.
from explorer.
Related Issues (20)
- Error using is_finite and is_infinite within mutate HOT 1
- Explorer NIF broken on FreeBSD HOT 12
- Support Elixir built in Duration struct HOT 1
- Bug: Rounding Error in Tests HOT 1
- exposing the `fold` expressions from Polars HOT 7
- :nif_panicked "Chunk require all its arrays to have an equal number of rows" HOT 1
- Sorting an empty DataFrame results in a runtime Polars error HOT 1
- Performance of `DataFrame.new/2` on dataframes containing list columns HOT 7
- Large memory usage when using `Explorer.Dataframe.concat_columns` on 30k (small) data frames. Memory leak? HOT 4
- [Not Issue] - Are the plans to use duckdb as an alternative backend? HOT 2
- Support streaming: true on collect HOT 1
- Select from different dataframe inside mutate HOT 1
- Config to make table view default like in Polars itself HOT 1
- Duckdb select statement only works on non first line in the cell HOT 4
- Explorer.DataFrame.from_query/4 with :snowflake adapter returns dtype error on any numeric field HOT 3
- Orders of names after CSV loading seems buggy since 9.0 HOT 5
- Join on columns of type `:list` HOT 2
- Discrepancy between typespec for Series.cast/2 (parameter dtype) and implementation HOT 1
- LazyFrame not being able to cast dtypes HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from explorer.