Comments (3)
@flying-sheep, thanks for creating this issue. Please provide a minimal reproducer (see http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports for guidelines) to allow us to investigate your problem.
from dask.
Not really necessary, you can directly see the problem in the linked line, but sure.
import pandas as pd
import dask.dataframe as dd
custom_sum = dd.Aggregation(
name='custom_sum',
chunk=lambda s: s.sum(),
agg=lambda s0: s0.sum()
)
df = pd.DataFrame(dict(a=[1, 1, 1, 1, 1], g=[5, 6, 6, 6, 7]))
ddf = dd.from_pandas(df, npartitions=2)
ddf.groupby('g')['a'].agg(sum="sum").compute() # works
ddf.groupby('g')['a'].agg(sum=custom_sum).compute() # broken
from dask.
Thanks for adding the reproducer! Please keep in mind that the time of contributors is limited, so having a reproducer ready allows us to move more quickly. It is also a great starting point for a regression test.
from dask.
Related Issues (20)
- CI is printing tracebacks for all xfailed tests which can be very confusing
- Combined save and calculation is using excessive memory HOT 2
- Array API in Dask
- Feedback - DataFrame query planning HOT 7
- importing dask.dataframe changes pandas behaviour in 2024.3.0 HOT 11
- Dumb code error in the Example code in Dask-SQL Homepage HOT 3
- dask.bag.Bag.to_dataframe behavior change in 2024.3.0 - setting dtype to string rather than object by default HOT 4
- TypeError: float() argument must be a string or a real number, not 'csr_matrix' HOT 1
- dask.dataframe.Series.reduction is not available when using query planning HOT 4
- Dask query planning string column unique bug HOT 2
- Dataframe constructed from single partition bag cannot be shuffled with query planning enabled HOT 2
- dask.dataframe.DataFrame.reduction fails on`split_every=False` if query planning is in effect HOT 1
- as of v2024.3.1, comparing a 1D dask.array.Array to a dask.dataframe.Series fails HOT 1
- value_counts with NaN sometimes raises ValueError: No objects to concatenate HOT 2
- .loc fails to select columns from boolean array (after dask-exp update)
- Minimal dd.to_datetime to convert a string column no longer works
- ``new_dd_object``'s array logic always assumes the metadata is ``numpy``
- `vindex` as outer indexer: memory and time performance
- Hash join transfer with error cannot pickle '_contextvars.ContextVar' object HOT 6
- `set_index` returns the divisions instead of the dataframe with query planning enabled
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask.