policyengine / openfisca-tools Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 1.0 144 KB

Python tools for enhancing OpenFisca country packages.

Makefile 0.16% Python 99.84%

openfisca-tools's Introduction

PolicyEngine

This repository contains the core infrastructure for policyengine.org. Namely:

policyengine, a Python package which contains the server-side implementations, and
policyengine-client, a React library containing high-level components to build the client-side interface.

Development

NOTE: requires Python 3.7

First, ensure you have pnpm installed: https://pnpm.io/installation.

Then, install using make install. Then, to debug the client, run make debug-client, or to debug the server, run make debug-server.

If your changes involve the server, change useLocalServer = false; to useLocalServer = true; in policyengine-client/src/countries/country.jsx. Otherwise, change usePolicyEngineOrgServer = false; to usePolicyEngineOrgServer = true; in policyengine-client/src/countries/country.jsx.

If you don't have access to the UK Family Resources Survey, you can still run the UK population-wide calculator on an anonymised version. To do that, instead of running make debug-server, run UK_SYNTHETIC=1 make debug-server

openfisca-tools's People

Stargazers

Watchers

Forkers

rickecon

openfisca-tools's Issues

Add piecewise formulas

Ensure `sum_of_variables` works on lower level entities

For example, see PolicyEngine/policyengine-us#816

Add `between` function or operator

As implemented in pandas, so we can do age.between(18, 64) or between(age, 18, 64) instead of (age >= 18) & (age < 65)

Catch `Microsimulation.df("col")`

Microsimulation.df requires a list of column names. When passing a single string instead, it throws:

sim.df("state_code")

KeyError: 's'

Could be more informative or just listify args.

Informative error message when calling `add` over variables that don't exist

This log makes me think this is what it yields:

AttributeError: 'NoneType' object has no attribute 'entity'

`defined_for` doesn't work for simulation-defining formulas

This is a pretty complex edge case I didn't consider when writing the defined_for logic. The way that defined_for works is by intercepting the entity(variable, period) calls inside a subsetted variable's formula and pre-subsetting them, so normal operations on them return the subsetted population results. But no interception happens when a formula creates a new simulation and uses outputs from simulation.calculate.

Combine all persons into higher-level entities in `IndividualSim`

Either by default or as a command, e.g. sim.combine_people

Use household net income in `increases_net_income`

Pattern for when a benefit is reported (skip)

cc @MaxGhenis

Accept single columns to `add`

Currently add(household, period, "column") throws an uninformative error message at the calc stage. It'd be easier to automatically listify it.

Uprating doesn't apply to scale parameters

This is preventing the calibration from using uprated income tax by income band parameters for years beyond 2021.

Update default `IndividualSim` year from 2021 to 2022

openfisca-tools/openfisca_tools/hypothetical.py

Line 29 in 9f78915

def __init__(self, reform: ReformType = (), year: int = 2021) -> None:

Work (or fail gracefully) when using `deriv` on a variable defined at an entity that sim lacks

See PolicyEngine/policyengine-us#693, which shows that an openfisca-us IndividualSim deriv call fails when calculating the derivative of a variable defined at an entity absent from the sim. In that example, snap is at the SPM unit level, but the sim only has a person.

This gets into the broader issue we've discussed, that it would be nice if individuals in an IndividualSim were automatically combined to higher-level entities. I think that'd fix the case where only the sim only has a person. I'm not sure what would work in the household case.

In the meantime, a more informative error message would help.

Function to generate sum-of-variables formula

Would be of the format list[str] -> function[entity, period, parameters -> float]

And then invoked via:

class variable(Variable):
  ...

  formula = sum_of_variables([var1, var2])

`Microsimulation.df` throws `TypeError` with some sequences of variables

This works:

from openfisca_us import Microsimulation
sim = Microsimulation()
sim.df(["state_code", "snap_gross_income_fpg_ratio"])

but this doesn't:

sim.df(["snap_gross_income_fpg_ratio", "state_code"])

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-11-856cb2e0dc06>](https://localhost:8080/#) in <module>()
----> 1 df = sim.df(["snap_gross_income_fpg_ratio", "state_code"])
      2 ca_below_fpl = df[(df.snap_gross_income_fpg_ratio < 1) & (df.state_code == "CA")]
      3 ca_below_fpl

3 frames
[/usr/local/lib/python3.7/dist-packages/openfisca_tools/microsimulation.py](https://localhost:8080/#) in map_to(self, arr, entity, target_entity, how)
    216                 return entity_pop.project(arr)
    217             if how == "mean":
--> 218                 return entity_pop.project(arr / entity_pop.nb_persons())
    219         elif entity == target_entity:
    220             return arr

TypeError: ufunc 'true_divide' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

state_code is a household-level string, snap_gross_income_fpg_ratio is a spmu-level float.

Remove `amount_over`

I'm unsure of the value of amount_over, which is currently used inconsistently over max_. I can see how it's more descriptive, but it's not completely obvious which argument is over which, and max_ will be familiar to more developers. It's also a bit less concise:

amount_over(x, y)
max_(x - y, 0)

I'd favor removing it at this point, but if we keep it, I'd suggest switching all relevant max_ statements over to it in openfisca-uk and openfisca-us.

Add metadata field to pass metadata to children

E.g. folder parameters which should all be indexed to inflation.

Add `multiply`, `and_`, and `or_`

e.g. to replace this:

would_claim_CTC = benunit("would_claim_CTC", period)
claims_legacy_benefits = benunit("claims_legacy_benefits", period)
return would_claim_CTC & claims_legacy_benefits

with this:

return and_(benunit, period, ["would_claim_CTC", "claims_legacy_benefits"])

or if an or condition:

return or_(benunit, period, ["would_claim_CTC", "claims_legacy_benefits"])

Require `add` and `aggr` to take lists of variable names or singletons

Rather than tuples.

`sum_of_variables` doesn't work with variables at different aggregate entity levels

See this notebook, where I'm trying to add a tax-unit-level variable (gi21) and add it to spm_unit_benefits by adding it to the sum_of_variables.

When running calc after doing this, I get an error:

ValueError: You tried to compute the variable 'gi21' for the entity 'people'; however the variable 'gi21' is defined for 'tax_units'.

Accept `IndividualSim(reform=None)`

This could be helpful for functions, e.g. currently I'm doing this:

def single_person_sim(reform=None):
    if reform is None:  # Breaks if passing IndividualSim(None).
        sim = IndividualSim(year=2022)
    else:
        sim = IndividualSim(reform, year=2022)

Automatically list-ify single references

Often when writing parameter YAML files, we specify an object rather than a list of objects:

reference:
  title: x
  href: y

instead of

reference:
  - title: x 
    href: y

We should have a f: ParameterNode -> ParameterNode function that automatically applies this correction.

Using changelog.yaml tool

Change `and_` and `or_` to use `&` and `|` instead of `*` and `+` operators

Currently, and_ and or_ are aliases for add_ and multiply_, respectively (or vice versa, they're duplicative). add_ and multiply_ apply + and * operators, respectively. I'd suggest that and_ and or_ instead apply & and | operators, respectively.

This won't change the result: np.array(bool) * np.array(bool) = np.array(bool), for example. But it would be more explicit, and could improve performance.

Relevant code:

openfisca-tools/openfisca_tools/model_api.py

Lines 75 to 77 in e2bc593

 agg_func = dict( 

 add=lambda x, y: x + y, multiply=lambda x, y: x * y, max=max_, min=min_ 

 )[agg_func]

Add `all_`

any_ is currently an alias for or_, but we don't have a parallel alias for all_ to and_:

openfisca-tools/openfisca_tools/model_api.py

Lines 177 to 179 in e2bc593

 or_ = add 

 any_ = or_ 

 multiply = and_

I think we should adopt a standard for OpenFisca programming to use only one of these patterns. Since we call these as a function, I'd suggest any_ and all_, which more closely resembles numpy and Python versions than and_ or or_.

That said, I'm indifferent on keeping or_ and and_ around. Python and numpy offer all four in some way, so maybe we could offer a warning that any_ and all_ are the standards and we suggest those instead, rather than breaking code? Open to suggestions here.

Fix randomness by simulation, not by record

To avoid correlating the randomness within records

Allow `vary` to accept parameters

Function to simplify categorical eligibility-checking pattern

For example, from the US CVRP PR:

p = parameters(period).states.ca.calepa.carb.cvrp.increased_rebate
categorically_eligible = np.any(
    [
        person.spm_unit(program, period)
        for program in p.categorical_eligibility
    ],
    axis=0,
)

Could we just use add(person.spm_unit, period, p.categorical_eligibility) > 0?

any_(entity, period, variables) would be useful nonetheless.

Make `select` an alias

Making it a function seems unnecessary, it could be an alias like clip and inf:

openfisca-tools/openfisca_tools/model_api.py

Lines 92 to 106 in 91e344e

 def select(conditions, choices): 

 """Selects the corresponding choice for the first matching condition in a list. 

  Args: 

  conditions (list): A list of boolean arrays 

  choices (list): A list of arrays 

  Returns: 

  Array: Array of values 

  """ 

 return np.select(conditions, choices) 

 clip = np.clip 

 inf = np.inf

Use list as arg to `is_in`

To mirror np.isin (until we can use that directly pending openfisca-core changes)

Remove `is_in` function

Can we can use np.isin instead?

Informative error message when `sum_of_variables` receives a variable name not in the system

I did a sum_of_variables(["misspelled_variable"]) and the error wasn't very helpful for diagnosing.

Remove mistaken import from `turtle`

Here:

openfisca-tools/openfisca_tools/model_api.py

Line 2 in 5529848

from turtle import pd

Partial formula execution

Although this might be relevant to Core, I suspect it'd need a much longer discussion to avoid breaking changes, so filing here with a view to implementing as a patch. There have been a few attempts already in #64 , but with some bugs so I though I'd sketch out the cleanest implementation here.

The problem

Some variables are only relevant to a small subset of the population. For example, Massachusetts income tax only needs to be calculated for people and groups in Massachusetts, and not the rest of the population. Right now, we implement the tax as simply zero for those other people, but this causes wasted computation time and space for 98% of entities, because NumPy vectorised operations happen regardless of the retrospective filter at the end.

The solution

We could have the following variable definition:

class ma_tax(Variable):
  value_type = float
  label = "MA income tax"
  definition_period = YEAR
  unit = TaxUnit
  
  def eligible(tax_unit, period, parameters):
    return tax_unit.household("state_code", period) == "MA"
  
  def formula(tax_unit, period, parameters):
    ...

eligible is run first to determine the relevant subset of the population, and then the main formula next. This will be much more efficient iff the formula is much more complex than eligible.

#64 has a prototype of the implementation, but it's buggy and needs more thought. I think there's a clean way to do this, intercepting the population passed to the formula to only return the subset values.

cc @MattiSG, @MaxGhenis, @rickecon

Rename `Microsimulation` to `GeneralMicrosimulation`

Would this make sense since it gets renamed in the country packages?

`aggr` failure catches more errors than are actually the cause

Leading to misdiagnoses like in PolicyEngine/policyengine-us#596

	agg_func = dict(
	add=lambda x, y: x + y, multiply=lambda x, y: x * y, max=max_, min=min_
	)[agg_func]

	def select(conditions, choices):
	"""Selects the corresponding choice for the first matching condition in a list.

	Args:
	conditions (list): A list of boolean arrays
	choices (list): A list of arrays

	Returns:
	Array: Array of values
	"""
	return np.select(conditions, choices)


	clip = np.clip
	inf = np.inf

policyengine / openfisca-tools Goto Github PK

openfisca-tools's Introduction

PolicyEngine

Development

openfisca-tools's People

Stargazers

Watchers

Forkers

openfisca-tools's Issues

The problem

The solution

Recommend Projects

Recommend Topics

Recommend Org