The campaign_expenditures from data4democracy

title	author	date	output
Campaign Spending Analysis: Read This First	@eric_bickel and @ryanes	1/17/2017	html_document

Slack: #propublica

Project Leads: @eric_bickel, @ryanes

Project Description: This ProPublica repository is part of Data for Democracy. Our purpose is to collaboratively work through analytic processes that support the journalism at ProPublica. This repository in particular contains analysis of campaign spending data. Currently, contributors are focusing on cleaning the campaign spending dataset. We are always open to ideas for how to work with this dataset to make it more useful to ProPublica. Please contact @ryanes or @eric_bickel on Slack with any suggestions or questions.

New contributors should review the analysis workflow below and then read the dataset description to access the campaign spending data and review the data cleaning methods.

Analysis Workflow

Reading, cleaning, and analyzing data should be done in a reproducible notebook format when possible. When submitting pull requests, please submit them from a fork of the repository and on a separate branch. Data for Democracy has an awesome set of instructions for how to do this if you need it.

Organizing Work

If contributors are working on projects other than updating the files in the main directory, they are encouraged to keep their work in a folder that is named in a way that describes the folder's contents. Some examples might be ml_model_R or alternate_cleaning_python. This should make it easier for new contributors to follow what is happening and make judgements about how to organize their contributions.

Loading and Cleaning Datasets

For each analysis, data needs to be loaded and cleaned to a format that is useable for the current analysis and for future analyses.

After data has been cleaned, both the raw data and cleaned data should be uploaded to a project-specific data.world repo. Additionally, the project's readme should be updated with a summary of the cleansing process and any code associated with cleaning should be pushed to the project's GitHub repo.

Exploratory Analysis

Team members working in exploratory analysis work up general statistics, distributions of important variables, and hypotheses based on initial exploration of covariation.

When an analysis job is complete, a pull request to the GitHub repo should be made to be edited by collaborators of the project or a committee of assigned editors.

Modeling

Team members use modeling techniques to test the hypotheses generated in the exploratory analysis phase and to quantify relationships between variables in the data. Team members may also be working to test specific hypotheses generated by ProPublica.

Algorithms used in the modeling should be vetted through open discussions with the team and through pull requests, and final model specification should be a collaborative effort using any individual findings from the discussion. The project readme should outline these specifications, and the final modeling code should be pushed to the GitHub repo.

Reporting

Team members detail the findings in a reproducible report that can be immediately used by ProPublica. All sources and data used should be linked in the report, and the project readme containing all background in methodology and links to data and code.

data4democracy / campaign_expenditures Goto Github PK

campaign_expenditures's Introduction

Analysis Workflow

Organizing Work

Loading and Cleaning Datasets

Exploratory Analysis

Modeling

Reporting

campaign_expenditures's People

Contributors

Stargazers

Watchers

Forkers

campaign_expenditures's Issues

Analyze variation in the spending of individual committee

Move 2008-2012 datasets to data.world

Standardize the spending descriptions

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent