Comments (4)
Thanks @nikhilwoodruff for having taken the time to formalise this!
Since this is performance optimisation, may I suggest to first run a performance analysis? It is often hard to measure precisely the performance impact of a specific implementation and optimisations, especially in the context of vectorial computing π
Lacking specific rules for such cases, I would personally ask for a clear measurement of expected gains before supporting addition of such a feature in Core. You can see an example of such a demonstration in openfisca/openfisca-core#1027.
Currently, I do not know of major needs for performance optimisation internationally on the computing side βmore on the memory side. If this need is specific to the case of the US and its 50 states (logically leading to distribute 2% of cases in the disjunction), maybe the creation of a helper wrapping systematic calls to where()
, or leveraging extensions could help? I know New South Wales went for an architecture with strong splitting through extensions, this might of inspiration π
cc @benjello @maukoquiroga @liamdmccann
from openfisca-tools.
- Disclaimer: I did not read your implementation in the PR
- I don't think that casting the formula to a smaller population size will increase your speed performance
- I recognize that there is a real memory problem. You can somehow make it less important by populating the
cache_blacklist
of the simulation by intermediate variables that are computed only once. - The real solution to this problem is IMHO (and you may already have heard that from me) to have relational / correspondence links between entities. In this case, you would have MA resident as a separate entity with its own variables (ma_tax) with a one2one relation to the larger US resident entity with US wide generic variables (wage, residence). I actually see this need as a recurrent one (linking firms to employees, employee to retirement schemes and so on).
cc @eraviart
from openfisca-tools.
Thanks both for these contributions!
@MattiSG - yes, I'll add a performance analysis. In the US case I'm guessing this roughly equates to a 50x speedup for each affected variable. As for extensions, I think they might not work for our use case, because we might want to see for example how a change to federal tax law affects total state tax liability, in which case we'd need all the individual state variables to run together.
I don't think that casting the formula to a smaller population size will increase your speed performance
Could you elaborate? My thinking is that since the array sizes are smaller, and NumPy does interate over them at some low level, it'll be quicker.
You can somehow make it less important by populating the
cache_blacklist
of the simulation by intermediate variables that are computed only once.
Nice, thanks for this tip. Will keep it in mind here.
The real solution to this problem is IMHO (and you may already have heard that from me) to have relational / correspondence links between entities.
Yes, really like this idea and this would be a great feature to add to Core, IMO.
from openfisca-tools.
I don't think that casting the formula to a smaller population size will increase your speed performance
Could you elaborate? My thinking is that since the array sizes are smaller, and NumPy does interate over them at some low level, it'll be quicker.
NumPy uses the vector capabilities of modern mathematical processors. Modulo that data can be handled by RAM the speed of the computation does not depend much on vector size.
from openfisca-tools.
Related Issues (20)
- Allow `vary` to accept parameters
- `aggr` failure catches more errors than are actually the cause
- Uprating doesn't apply to scale parameters
- Automatically list-ify single references
- Work (or fail gracefully) when using `deriv` on a variable defined at an entity that sim lacks
- Using changelog.yaml tool
- `sum_of_variables` doesn't work with variables at different aggregate entity levels HOT 1
- Catch `Microsimulation.df("col")`
- `Microsimulation.df` throws `TypeError` with some sequences of variables HOT 1
- Informative error message when `sum_of_variables` receives a variable name not in the system
- Update default `IndividualSim` year from 2021 to 2022
- Pattern for when a benefit is reported (skip)
- Accept `IndividualSim(reform=None)`
- Accept single columns to `add`
- Ensure `sum_of_variables` works on lower level entities
- Add piecewise formulas
- Fix randomness by simulation, not by record
- Informative error message when calling `add` over variables that don't exist
- `defined_for` doesn't work for simulation-defining formulas
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openfisca-tools.