Giter Club home page Giter Club logo

openfisca-tools's Introduction

PolicyEngine

This repository contains the core infrastructure for policyengine.org. Namely:

  • policyengine, a Python package which contains the server-side implementations, and
  • policyengine-client, a React library containing high-level components to build the client-side interface.

Development

NOTE: requires Python 3.7

First, ensure you have pnpm installed: https://pnpm.io/installation.

Then, install using make install. Then, to debug the client, run make debug-client, or to debug the server, run make debug-server.

If your changes involve the server, change useLocalServer = false; to useLocalServer = true; in policyengine-client/src/countries/country.jsx. Otherwise, change usePolicyEngineOrgServer = false; to usePolicyEngineOrgServer = true; in policyengine-client/src/countries/country.jsx.

If you don't have access to the UK Family Resources Survey, you can still run the UK population-wide calculator on an anonymised version. To do that, instead of running make debug-server, run UK_SYNTHETIC=1 make debug-server

openfisca-tools's People

Stargazers

 avatar

Watchers

 avatar

Forkers

rickecon

openfisca-tools's Issues

Catch `Microsimulation.df("col")`

Microsimulation.df requires a list of column names. When passing a single string instead, it throws:

sim.df("state_code")

KeyError: 's'

Could be more informative or just listify args.

`defined_for` doesn't work for simulation-defining formulas

This is a pretty complex edge case I didn't consider when writing the defined_for logic. The way that defined_for works is by intercepting the entity(variable, period) calls inside a subsetted variable's formula and pre-subsetting them, so normal operations on them return the subsetted population results. But no interception happens when a formula creates a new simulation and uses outputs from simulation.calculate.

Accept single columns to `add`

Currently add(household, period, "column") throws an uninformative error message at the calc stage. It'd be easier to automatically listify it.

Work (or fail gracefully) when using `deriv` on a variable defined at an entity that sim lacks

See PolicyEngine/policyengine-us#693, which shows that an openfisca-us IndividualSim deriv call fails when calculating the derivative of a variable defined at an entity absent from the sim. In that example, snap is at the SPM unit level, but the sim only has a person.

This gets into the broader issue we've discussed, that it would be nice if individuals in an IndividualSim were automatically combined to higher-level entities. I think that'd fix the case where only the sim only has a person. I'm not sure what would work in the household case.

In the meantime, a more informative error message would help.

`Microsimulation.df` throws `TypeError` with some sequences of variables

This works:

from openfisca_us import Microsimulation
sim = Microsimulation()
sim.df(["state_code", "snap_gross_income_fpg_ratio"])

but this doesn't:

sim.df(["snap_gross_income_fpg_ratio", "state_code"])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-11-856cb2e0dc06>](https://localhost:8080/#) in <module>()
----> 1 df = sim.df(["snap_gross_income_fpg_ratio", "state_code"])
      2 ca_below_fpl = df[(df.snap_gross_income_fpg_ratio < 1) & (df.state_code == "CA")]
      3 ca_below_fpl

3 frames
[/usr/local/lib/python3.7/dist-packages/openfisca_tools/microsimulation.py](https://localhost:8080/#) in map_to(self, arr, entity, target_entity, how)
    216                 return entity_pop.project(arr)
    217             if how == "mean":
--> 218                 return entity_pop.project(arr / entity_pop.nb_persons())
    219         elif entity == target_entity:
    220             return arr

TypeError: ufunc 'true_divide' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

state_code is a household-level string, snap_gross_income_fpg_ratio is a spmu-level float.

Remove `amount_over`

I'm unsure of the value of amount_over, which is currently used inconsistently over max_. I can see how it's more descriptive, but it's not completely obvious which argument is over which, and max_ will be familiar to more developers. It's also a bit less concise:

amount_over(x, y)
max_(x - y, 0)

I'd favor removing it at this point, but if we keep it, I'd suggest switching all relevant max_ statements over to it in openfisca-uk and openfisca-us.

Add `multiply`, `and_`, and `or_`

e.g. to replace this:

would_claim_CTC = benunit("would_claim_CTC", period)
claims_legacy_benefits = benunit("claims_legacy_benefits", period)
return would_claim_CTC & claims_legacy_benefits

with this:

return and_(benunit, period, ["would_claim_CTC", "claims_legacy_benefits"])

or if an or condition:

return or_(benunit, period, ["would_claim_CTC", "claims_legacy_benefits"])

Accept `IndividualSim(reform=None)`

This could be helpful for functions, e.g. currently I'm doing this:

def single_person_sim(reform=None):
    if reform is None:  # Breaks if passing IndividualSim(None).
        sim = IndividualSim(year=2022)
    else:
        sim = IndividualSim(reform, year=2022)

Automatically list-ify single references

Often when writing parameter YAML files, we specify an object rather than a list of objects:

reference:
  title: x
  href: y

instead of

reference:
  - title: x 
    href: y

We should have a f: ParameterNode -> ParameterNode function that automatically applies this correction.

Change `and_` and `or_` to use `&` and `|` instead of `*` and `+` operators

Currently, and_ and or_ are aliases for add_ and multiply_, respectively (or vice versa, they're duplicative). add_ and multiply_ apply + and * operators, respectively. I'd suggest that and_ and or_ instead apply & and | operators, respectively.

This won't change the result: np.array(bool) * np.array(bool) = np.array(bool), for example. But it would be more explicit, and could improve performance.

Relevant code:

agg_func = dict(
add=lambda x, y: x + y, multiply=lambda x, y: x * y, max=max_, min=min_
)[agg_func]

Add `all_`

any_ is currently an alias for or_, but we don't have a parallel alias for all_ to and_:

or_ = add
any_ = or_
multiply = and_

I think we should adopt a standard for OpenFisca programming to use only one of these patterns. Since we call these as a function, I'd suggest any_ and all_, which more closely resembles numpy and Python versions than and_ or or_.

That said, I'm indifferent on keeping or_ and and_ around. Python and numpy offer all four in some way, so maybe we could offer a warning that any_ and all_ are the standards and we suggest those instead, rather than breaking code? Open to suggestions here.

Function to simplify categorical eligibility-checking pattern

For example, from the US CVRP PR:

p = parameters(period).states.ca.calepa.carb.cvrp.increased_rebate
categorically_eligible = np.any(
    [
        person.spm_unit(program, period)
        for program in p.categorical_eligibility
    ],
    axis=0,
)

Could we just use add(person.spm_unit, period, p.categorical_eligibility) > 0?

any_(entity, period, variables) would be useful nonetheless.

Make `select` an alias

Making it a function seems unnecessary, it could be an alias like clip and inf:

def select(conditions, choices):
"""Selects the corresponding choice for the first matching condition in a list.
Args:
conditions (list): A list of boolean arrays
choices (list): A list of arrays
Returns:
Array: Array of values
"""
return np.select(conditions, choices)
clip = np.clip
inf = np.inf

Partial formula execution

Although this might be relevant to Core, I suspect it'd need a much longer discussion to avoid breaking changes, so filing here with a view to implementing as a patch. There have been a few attempts already in #64 , but with some bugs so I though I'd sketch out the cleanest implementation here.

The problem

Some variables are only relevant to a small subset of the population. For example, Massachusetts income tax only needs to be calculated for people and groups in Massachusetts, and not the rest of the population. Right now, we implement the tax as simply zero for those other people, but this causes wasted computation time and space for 98% of entities, because NumPy vectorised operations happen regardless of the retrospective filter at the end.

The solution

We could have the following variable definition:

class ma_tax(Variable):
  value_type = float
  label = "MA income tax"
  definition_period = YEAR
  unit = TaxUnit
  
  def eligible(tax_unit, period, parameters):
    return tax_unit.household("state_code", period) == "MA"
  
  def formula(tax_unit, period, parameters):
    ...

eligible is run first to determine the relevant subset of the population, and then the main formula next. This will be much more efficient iff the formula is much more complex than eligible.

#64 has a prototype of the implementation, but it's buggy and needs more thought. I think there's a clean way to do this, intercepting the population passed to the formula to only return the subset values.

cc @MattiSG, @MaxGhenis, @rickecon

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.