Giter Club home page Giter Club logo

educationdatanc's Introduction

The Belk Endowment Educational Attainment Data Repository for North Carolina Public Schools

The North Carolina Public Schools Report Card and Statistical Profiles Databases contain a large volume of information about public, charter, and alternative schools in the State of North Carolina. Information that is made publicly accessible comprises data at the school, district, and state levels. This includes statistics on student and school performance, academic growth, diversity, safety, instructor experience levels, school funding, educational attainment, and much more.

What is Available?

This repository maintains educational attainment data specific to North Carolina Public School Campuses which is organized by academic school year. All data sources are pulled directly from multiple locations at http://ncpublicschools.org. This is an open source repository. All source code is currently written in Python / Pandas / Scikit Learn and published for review via iPython Notebooks. No programming software or special tools are required to view source code file contents and outputs are each processing stage. Source code may be viewed in your web browser by clicking on the links in each directory's ReadMe.md content.

How is the Data Organized?

Each academic school year folder contains the following data:

  • Raw Datasets - These are the original data sources provided by http://ncpublicschools.org. We write code to download each dataset directly from the original URL, filter by academic school year (when necessary), rename the year and unit_code fields for consistency, and then save the data in its original format prior to any further processing.
  • School Datasets - Once each raw data source is filtered by school year and saved in its original format, all files are processed, consolidated, and merged into a single file containing one record per public school campus, per year. This process is complex and requires multiple table pivots and various other data transformations since many of the original data files contain multiple records per school campus, per year. Once a master "Public Schools" file is created we also create three additional files with campuses segmented at the high school, middle school, and elementary school levels. We publish all of our source code, so you may click on any of the respective file links to view the original data file URLs and transformations at each processing stage.
  • Machine Learning Datasets - One machine learning (_ML) dataset is created for each respective public school dataset. These datasets are intended for classification or regression modeling and have gone through many additional transformations specific to data pre-processing for machine learning. This includes removing columns with large amounts of missing data, all unique values, or duplicated / highly correlated columns. In addition, all categorical variables are converted to numeric data via a process called one-hot encoding. Specific documentation and transformation reports for each dataset are avaiable in the source code folders.

Data Documentation

The following resources and reports are currently available to assist in understanding table, field, and code definitions for public school data:

  • data-dictionary.pdf - Metadata file containing field definitions by table for most fields in the NC Report Card database and All_Data_By_School_Final.xlsx

Reports

This folder includes links to research and reports produced using data in this repository.

Citations

Please cite this repository and send us an email, if you use it!

References

Drew J., The Belk Endowment Educational Attainment Data Repository for North Carolina Public Schools, (2018), GitHub repository, https://github.com/jakemdrew/EducationDataNC

BibTeX

@misc{BelkNCEARepo,
     author = {Drew, J.},
     title = {The Belk Endowment Educational Attainment Data Repository for North Carolina Public Schools},
     year = {2018},
     publisher = {GitHub},
     journal = {GitHub repository},
     howpublished = {\url{https://github.com/jakemdrew/EducationDataNC}}
}

Acknowledgements

John M Belk Endowment

This research is made possible by: http://jmbendowment.org/

educationdatanc's People

Contributors

jakemdrew avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.