Giter Club home page Giter Club logo

clean_rais's Introduction

Cleaning the Relação Anual de Informações Sociais (RAIS) dataset in Stata, 1985-2018

This repository contains Stata code that cleans and normalizes all RAIS for years 1985-2018.

More information about RAIS, the Brazilian matched employer-employee dataset:

Requirements

  • Stata (preferably version 14+)

Basic Usage

  1. Clone or download the repository.
  2. Paste the raw RAIS data files into /input.
  3. Run each year's dofile in /src/sub. Adjust the directory path to your own setup.
  4. Run the dofile /src/sub/build_subsets.do.
  5. Run the dofile /src/sub/build_collapses.do.

Output

This repository outputs RAIS all cleaned and normalized. It generates three sets of main datasets: (1) at worker-establishment-municipality level, (2) at worker-municipality level, (3) at establishment-municipality level. It also builds collapsed data sets at establishment-municipality and establishment level.

It provides some cleaning fixes to the original data:

  • It standardizes all variable names and labels.
  • It fixes wage variables with missing values.
  • It generates deflated wage variables, relative to 2018.
  • It allows for sample output data sets, if one prefers to work with smaller files.
  • It standardizes classification variables (CNAE and CBO), and builds IBGE's broad sectors variables.
  • It classifies types of establishments, into public, private, nonprofit, and by sphere/branch of government.
  • It reconstructs CPF data back to years before 2003, for workers who show up in prior years.

Tips

  • See the file /extra/Variables_RAIS_1985-2018.xlsx for a complete dictionary of variables, labels, values and availability year-by-year.
  • Identified RAIS data is not public. To get access to it, one must (1) be in an university/institution that already has an agreement with the Ministério da Economia, or (2) apply for new access.
  • Run this in a server with supercomputer capabilities. RAIS files are large.
  • For advice on structuring directories and code, please refer to my template repository.
  • Prof. Marc Muendler (UCSD) has useful material about RAIS.

Credits

If you benefit from code in this repository, please cite it in your work as:

Bugs, Comments and Suggestions

If you find any issues in my code, or have any suggestions for improvements, please open an issue or just email me at [email protected].

clean_rais's People

Contributors

rdahis avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.