Comments (3)
One simple comparison that would be useful is how the memory consumption of standard sklearn RandomForests compares on dataframes of the same size, since much of the EconML tree code was forked from sklearn (version 0.24, I believe).
Although 180GB does seem excessive, I don't think it is really exponential - if your input has 40M floating point values, the raw data for that alone is 320MB, so this is ~560 times the size of your dataset. Certainly if we can easily optimize things to bring this down we should, but it's not even quadratic in the number of elements.
You mention that memory is high for both fit and effect: do you mean that while running those methods memory usage spikes but then comes back down to a more reasonable amount when the method calls complete?
from econml.
You mention that memory is high for both fit and effect: do you mean that while running those methods memory usage spikes but then comes back down to a more reasonable amount when the method calls complete?
Yes, memory usage spikes, but then comes back down.
I'm trying to investigate better inside fit, but in predict_point_and_var, I identified that the spike of memory comes after the second Parallel call inside var condition, so I think memory spike is probably origined on these rows:
EconML/econml/grf/_base_grf.py
Lines 703 to 763 in db1e254
from econml.
Another important detail. I was using a treatment dataframe with featurizer, making me have 6 columns in T. I was inspecting code, and, in many steps, they use a cross product of T over T. I think this is contributing for this memory spike too.
from econml.
Related Issues (20)
- Why Shape of Y in Causal Forest notebook is 1000*1000 HOT 2
- Migrate DeepIV to new TensorFlow API or PyTorch HOT 3
- Support numpy 2.0 HOT 1
- Support scikit-learn 1.5.0
- Causal Forest DML has very wide confidence interval HOT 1
- V0.15.0 runs hours longer than V0.14.0 HOT 3
- oob_predict_interval: request to add functionality for prediction of out-of-bag confidence intervals
- DeepIV.fit with Inference='bootstrap' throws error
- ModuleNotFoundError: No module named 'econml.dynamic' HOT 1
- Error of crossfit folds splits with DynamicDML HOT 2
- Inquiry About Confidence Interval Type in .effect_inference Method HOT 2
- criterion in grf.CausalForest HOT 1
- Bug when passing featurizer that requires more than 1 row in SparseLinearDrLearner HOT 1
- XgBoost as nuisance model in DML learners (categorical features) HOT 3
- ForestDRLearner : outcome binary and treatement is discret ( 3 values) HOT 2
- ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (shap) HOT 3
- Unable to run tests HOT 2
- Is there an equivalent to `get.forest.weights()` from the `grf` R package?
- Output the intercept value in CausalForestDML
- How to save model to PMML HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from econml.