Giter Club home page Giter Club logo

openfisca-us-data's Introduction

PolicyEngine

This repository contains the core infrastructure for policyengine.org. Namely:

  • policyengine, a Python package which contains the server-side implementations, and
  • policyengine-client, a React library containing high-level components to build the client-side interface.

Development

NOTE: requires Python 3.7

First, ensure you have pnpm installed: https://pnpm.io/installation.

Then, install using make install. Then, to debug the client, run make debug-client, or to debug the server, run make debug-server.

If your changes involve the server, change useLocalServer = false; to useLocalServer = true; in policyengine-client/src/countries/country.jsx. Otherwise, change usePolicyEngineOrgServer = false; to usePolicyEngineOrgServer = true; in policyengine-client/src/countries/country.jsx.

If you don't have access to the UK Family Resources Survey, you can still run the UK population-wide calculator on an anonymised version. To do that, instead of running make debug-server, run UK_SYNTHETIC=1 make debug-server

openfisca-us-data's People

Contributors

baogorek avatar maxghenis avatar nikhilwoodruff avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

openfisca-us-data's Issues

Empty households

Trying to debug:

from openfisca_us import Microsimulation
sim = Microsimulation()
sim.calc("household_net_income")

(breaks, as there are only 60k households returned by summing net income among households, but 90k households). I dug around and this is seemingly because there are some households with a household ID household.H_SEQ in the CPS, for which there are no people. MWE:

from openfisca_us_data import CPS
household = CPS.load(2020, "household")
person = CPS.load(2020, "person")
4 in household.H_SEQ.values and 4 not in person.PH_SEQ.values
>> True

@MaxGhenis @nmrodelo @tolaouk if you've worked with the CPS indexes before and can spot a simple error on my part here, would be much appreciated! Thanks - this should get policyengine-us pretty close to functional.

.fillna(0) used in RawCPS, should probably be used in CPS

As I'm working on my own raw class functionality, I'm looking for conventions to follow. However, I'm wondering whether to follow the conventions of lines 47, 49 and 51 of openfisca_us_data/datasets/cps/raw_cps.py, which fills in missing data with 0s as so:

storage["person"] = person = pd.read_csv(f).fillna(0)

I assume a processing operation like this would be better suited to the CPS method rather than RawCPS. If not, just let me know the reasoning behind putting it here as I'm making similar decisions for the CE survey.

Edit: Ah I'm seeing the functions below that take sums and probably need those 0s filled in. I guess I'm still struggling with what processing goes in Raw.

Add remaining variables

At this point, we should check that all variables are covered and evaluate any that are outstanding.

Change ASEC year to be survey year

e.g. currently openfisca-us-data raw_cps generate 2021 downloads the "2021 ASEC" which captures activity for calendar year 2020 (survey administered in March 2021). I think we should subtract 1 from all years to align with policy years.

Error generating cps from Colab

See this notebook.

!openfisca-us-data raw_cps generate 2021 works, but then !openfisca-us-data cps generate 2021 produces:

/usr/local/lib/python3.7/dist-packages/openfisca_core/parameters/config.py:17: LibYAMLWarning: libyaml is not installed in your environment. This can make OpenFisca slower to start. Once you have installed libyaml, run 'pip uninstall pyyaml && pip install pyyaml --no-cache-dir' so that it is used in your Python environment.

  warnings.warn(" ".join(message), LibYAMLWarning)
tcmalloc: large alloc 1082007552 bytes == 0x55b0a79ea000 @  0x7f62828df1e7 0x7f627fd4046e 0x7f627fd90c7b 0x7f627fd9135f 0x7f627fe33103 0x55b0a3936544 0x55b0a3936240 0x55b0a39aa627 0x55b0a3937afa 0x55b0a39a5c0d 0x55b0a3939b6b 0x55b0a397a9c9 0x55b0a397a93c 0x55b0a3a1e409 0x55b0a39a5e7a 0x55b0a39a49ee 0x55b0a3937bda 0x55b0a39a6737 0x55b0a39a49ee 0x55b0a3937bda 0x55b0a39a5c0d 0x55b0a3937afa 0x55b0a39a9d00 0x55b0a3937afa 0x55b0a39a9d00 0x55b0a3939b6b 0x55b0a397a9c9 0x55b0a397a93c 0x55b0a3a1e409 0x55b0a39a5e7a 0x55b0a39a4ced
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/openfisca_us_data/cli.py", line 17, in main
    return getattr(datasets[args.dataset], args.action)(*args.args)
  File "/usr/local/lib/python3.7/dist-packages/openfisca_us_data/utils.py", line 97, in new_generate_func
    return generate_func(year, *args)
  File "/usr/local/lib/python3.7/dist-packages/openfisca_us_data/datasets/cps/cps.py", line 30, in generate
    "person",
  File "/usr/local/lib/python3.7/dist-packages/openfisca_us_data/datasets/cps/cps.py", line 29, in <listcomp>
    for entity in (
  File "/usr/local/lib/python3.7/dist-packages/pandas/io/pytables.py", line 569, in __getitem__
    return self.get(key)
  File "/usr/local/lib/python3.7/dist-packages/pandas/io/pytables.py", line 792, in get
    return self._read_group(group)
  File "/usr/local/lib/python3.7/dist-packages/pandas/io/pytables.py", line 1810, in _read_group
    return s.read()
  File "/usr/local/lib/python3.7/dist-packages/pandas/io/pytables.py", line 3161, in read
    values = self.read_array(f"block{i}_values", start=_start, stop=_stop)
  File "/usr/local/lib/python3.7/dist-packages/pandas/io/pytables.py", line 2818, in read_array
    ret = node[0][start:stop]
  File "/usr/local/lib/python3.7/dist-packages/tables/vlarray.py", line 681, in __getitem__
    return self.read(start, stop, step)[0]
  File "/usr/local/lib/python3.7/dist-packages/tables/vlarray.py", line 821, in read
    listarr = self._read_array(start, stop, step)
  File "tables/hdf5extension.pyx", line 2155, in tables.hdf5extension.VLArray._read_array
ValueError: cannot set WRITEABLE flag to True of this array

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/openfisca-us-data", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/openfisca_us_data/cli.py", line 19, in main
    print(f"Encountered an error: {e.with_traceback()}")
TypeError: with_traceback() takes exactly one argument (0 given)
Closing remaining open files:/usr/local/lib/python3.7/dist-packages/openfisca_us_data/microdata/external/raw_cps_2021.h5...done

Add individual financial variables

We need to make sure the CPS dataset includes the data from:

  • e00200
  • e00900
  • pencon

These should be person-level (one variable in place of two _p and _s variables). In taxdata they are tax-unit level, but only refer to one person within that tax unit.

Add benefit income variables

The relevant variables are:

  • mcaid_ben
  • mcare_ben
  • ssi_ben
  • tanf_ben
  • vet_ben
  • wic_ben
  • snap_ben
  • housing_ben
  • other_ben

This should be de-prioritised since we're aiming to model many of these anyway.

CPS

The variables in cps.csv.gz that we need openfisca-us-data to replicate are listed in the comment below. Each issue in this epic breaks the project down into a smaller group of variables that can probably be implemented together. For each issue (or variable added), we'll need to add them in a PR to the CPS module, with unit tests that check aggregates are close to expected values.

Add individual demographic variables

This ensures that we use all the information (casting to person-level where necessary) from:
* age

  • blind (from blind_head and blind_spouse)
  • fips

Ensure OpenFisca-US completes a full microsimulation

This is the remaining step needed to get a basic PolicyEngine US working, and it'll likely show more errors not found in OpenFisca-US unit testing (acceptably so: the thousands of CPS households will provide more edge cases).

Utility to download and load file with progress bar

The status bar was creating errors in the ACS in #49 so I removed it. Once it's made reliable it'd be good to make it a utility since it's pretty similar across the CPS and ACS, and would also be similar for other datasets. Though maybe belongs in an openfisca-data package...

Archive repo

After moving the Consumer Expenditure Survey code to openfisca-us, and transferring issues there.

Throw informative error message when running `cps generate` before `raw_cps generate`

e.g. here's what happens when trying to generate the CPS for a year prior to generating the raw CPS:

(base) mghenis@penguin:~/PolicyEngine/openfisca-us-data$ openfisca-us-data cps generate 2018                                                                    
Downloaded ASEC: 100%|████████████████████| 13.5k/13.5k [00:00<00:00, 175kiB/s]
Traceback (most recent call last):
  File "/home/mghenis/PolicyEngine/openfisca-us-data/openfisca_us_data/datasets/cps/raw_cps.py", line 40, in generate                                           
    zipfile = ZipFile(file)
  File "/home/mghenis/anaconda3/lib/python3.8/zipfile.py", line 1269, in __init__                                                                               
    self._RealGetContents()
  File "/home/mghenis/anaconda3/lib/python3.8/zipfile.py", line 1336, in _RealGetContents                                                                       
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mghenis/PolicyEngine/openfisca-us-data/openfisca_us_data/cli.py", line 17, in main                                                                
    return getattr(datasets[args.dataset], args.action)(*args.args)
  File "/home/mghenis/PolicyEngine/openfisca-us-data/openfisca_us_data/utils.py", line 94, in new_generate_func                                                 
    return generate_func(year, *args)
  File "/home/mghenis/PolicyEngine/openfisca-us-data/openfisca_us_data/datasets/cps/cps.py", line 22, in generate                                               
    RawCPS.generate(year)
  File "/home/mghenis/PolicyEngine/openfisca-us-data/openfisca_us_data/utils.py", line 94, in new_generate_func                                                 
    return generate_func(year, *args)
  File "/home/mghenis/PolicyEngine/openfisca-us-data/openfisca_us_data/datasets/cps/raw_cps.py", line 51, in generate                                           
    f"Attempted to extract and save the CSV files, but encountered an error: {e.with_traceback()}"                                                              
TypeError: with_traceback() takes exactly one argument (0 given)

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/mghenis/anaconda3/bin/openfisca-us-data", line 33, in <module>
    sys.exit(load_entry_point('openfisca-us-data', 'console_scripts', 'openfisca-us-data')())                                                                   
  File "/home/mghenis/PolicyEngine/openfisca-us-data/openfisca_us_data/cli.py", line 19, in main                                                                
    print(f"Encountered an error: {e.with_traceback()}")

RawACS generation is failing

Due to pandas-dev/pandas#16615:

openfisca-us-data raw_acs generate 2018

throws:

ValueError: Length of values (3061064) does not match length of index (1615763)

This also occurs when doing it from Python:

pd.read_sas("https://www2.census.gov/programs-surveys/supplemental-poverty-measure/datasets/spm/spm_pu_2018.sas7bdat")

or when trying the encoding suggestion from SO:

pd.read_sas("https://www2.census.gov/programs-surveys/supplemental-poverty-measure/datasets/spm/spm_pu_2018.sas7bdat", encoding="iso-8859-1")

Call yaml.load with Loader

Tests throw this warning:

  /home/runner/work/openfisca-us-data/openfisca-us-data/tests/cps/test_aggregates.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
    tc = yaml.load(f)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.