Giter Club home page Giter Club logo

pudl's Introduction

The Public Utility Data Liberation Project (PUDL)

Project Status: Active PyTest Status Codecov Test Coverage Read the Docs Build Status Any color you want, so long as it's black. pre-commit CI Zenodo DOI Schedule a 1-on-1 chat with us about PUDL.

What is PUDL?

The PUDL Project is an open source data processing pipeline that makes US energy data easier to access and use programmatically.

Hundreds of gigabytes of valuable data are published by US government agencies, but it's often difficult to work with. PUDL takes the original spreadsheets, CSV files, and databases and turns them into a unified resource. This allows users to spend more time on novel analysis and less time on data preparation.

The project is focused on serving researchers, activists, journalists, policy makers, and small businesses that might not otherwise be able to afford access to this data from commercial sources and who may not have the time or expertise to do all the data processing themselves from scratch.

We want to make this data accessible and easy to work with for as wide an audience as possible: anyone from a grassroots youth climate organizers working with Google sheets to university researchers with access to scalable cloud computing resources and everyone in between!

PUDL is comprised of three core components:

Raw Data Archives

PUDL archives all our raw inputs on Zenodo to ensure permanent, versioned access to the data. In the event that an agency changes how they publish data or deletes old files, the data processing pipeline will still have access to the original inputs. Each of the data inputs may have several different versions archived, and all are assigned a unique DOI (digital object identifier) and made available through Zenodo's REST API. You can read more about the Raw Data Archives in the docs.

Data Pipeline

The data pipeline (this repo) ingests raw data from the archives, cleans and integrates it, and writes the resulting tables to SQLite and Apache Parquet files, with some acompanying metadata stored as JSON. Each release of the PUDL software contains a set of of DOIs indicating which versions of the raw inputs it processes. This helps ensure that the outputs are replicable. You can read more about our ETL (extract, transform, load) process in the PUDL documentation.

Data Warehouse

The SQLite, Parquet, and JSON outputs from the data pipeline, sometimes called "PUDL outputs", are updated each night by an automated build process, and periodically archived so that users can access the data without having to install and run our data processing system. These outputs contain hundreds of tables and comprise a small file-based data warehouse that can be used for a variety of energy system analyses. Learn more about how to access the PUDL data.

What data is available?

PUDL currently integrates data from:

Thanks to support from the Alfred P. Sloan Foundation Energy & Environment Program, from 2021 to 2024 we will be cleaning and integrating the following data as well:

How do I access the data?

For details on how to access PUDL data, see the data access documentation. A quick summary:

Contributing to PUDL

Find PUDL useful? Want to help make it better? There are lots of ways to help!

Licensing

In general, our code, data, and other work are permissively licensed for use by anybody, for any purpose, so long as you give us credit for the work we've done.

Contact Us

About Catalyst Cooperative

Catalyst Cooperative is a small group of data wranglers and policy wonks organized as a worker-owned cooperative consultancy. Our goal is a more just, livable, and sustainable world. We integrate public data and perform custom analyses to inform public policy (Hire us!). Our focus is primarily on mitigating climate change and improving electric utility regulation in the United States.

pudl's People

Contributors

zaneselvans avatar cmgosnell avatar aesharpe avatar e-belfer avatar katie-lamb avatar zschira avatar dependabot[bot] avatar bendnorman avatar rousik avatar ezwelty avatar alanawlsn avatar jdangerx avatar pre-commit-ci[bot] avatar trentonbush avatar stevenbwinter avatar karldw avatar swinter2011 avatar ptvirgo avatar arengel avatar gschivley avatar yashkumar1803 avatar grgmiller avatar pudlbot avatar dstansby avatar knordback avatar apptrain avatar robertozanchi avatar katherinelamb avatar davidmudrauskas avatar wheelspawn avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.