Giter Club home page Giter Club logo

us_elections_2020_csv's Introduction

US elections 2020

@kjhealy

  • Results as of YYYY_MM_DD_HH_MM_SS timestamp in file title. Not final.

  • NB: The unit of observation in these rows varies. President, Senate and Governor races are reported at the level of (a) the state and (b) some sub-state place unit, which will usually be county but may be a township or similar place. Setting the state-level totals aside, the smaller unit is either a county or something like a township. A variable (fips5) is provided that classifies places by their county FIPS. Votes are not reported more than once, however. So, again excluding state totals first, within any given state summing votes across id should yield the same total as summing them by fips5 and then summing those totals.

  • For example, for Massachusetts:

results_df %>% 
  filter(race == "President", id != "0", fips_char == "25", mpc == "1") %>% 
  select(race, id, fips_char, fips5,  place, lname, votes)

## A tibble: 702 x 7
#   race      id         fips_char fips5 place      lname votes
#   <chr>     <chr>      <chr>     <chr> <chr>      <chr> <int>
# 1 President 2500103690 25        25001 Barnstable Biden 15685
# 2 President 2500103690 25        25001 Barnstable Trump 10824
# 3 President 2500107175 25        25001 Bourne     Biden  5988
# 4 President 2500107175 25        25001 Bourne     Trump  5026
# 5 President 2500107980 25        25001 Brewster   Biden  4905
# 6 President 2500107980 25        25001 Brewster   Trump  2334
# 7 President 2500112995 25        25001 Chatham    Biden  3043
# 8 President 2500112995 25        25001 Chatham    Trump  1827
# 9 President 2500116775 25        25001 Dennis     Biden  6179
#10 President 2500116775 25        25001 Dennis     Trump  3934
# … with 692 more rows

First, group by id for township totals, then sum to the state:

results_df %>% 
  filter(race == "President", id != "0", fips_char == "25", mpc == "1") %>% 
  select(race, id, fips_char, fips5,  place, lname, votes) %>% 
  group_by(id, lname) %>%  # Group by ten digit id
  summarize(votes = sum(votes)) %>% 
  group_by(lname) %>% 
  summarize(state_total = sum(votes))

## A tibble: 2 x 2
#  lname state_total
#  <chr>       <int>
#1 Biden     2246208
#2 Trump     1117260

Alternatively, group by fips5 for county totals, then sum to the state:

results_df %>% 
  filter(race == "President", id != "0", fips_char == "25", mpc == "1") %>% 
  select(race, id, fips_char, fips5,  place, lname, votes) %>% 
  group_by(fips5, lname) %>%  # Group by county fips
  summarize(votes = sum(votes)) %>% 
  group_by(lname) %>% 
  summarize(state_total = sum(votes))
  
## A tibble: 2 x 2
#  lname state_total
#  <chr>       <int>
#1 Biden     2246208
#2 Trump     1117260

For many states, id and fips5 will be identical as all results are reported by county. But for states reporting by township, you must sum by fips5 to get county-level results:

## MA reported by township

results_df %>% 
  filter(race == "President", id != "0", fips_char == "25", mpc == "1") %>% 
  select(race, id, fips_char, fips5,  place, lname, votes)
  
## A tibble: 702 x 7
#   race      id         fips_char fips5 place      lname votes
#   <chr>     <chr>      <chr>     <chr> <chr>      <chr> <int>
# 1 President 2500103690 25        25001 Barnstable Biden 15685
# 2 President 2500103690 25        25001 Barnstable Trump 10824
# 3 President 2500107175 25        25001 Bourne     Biden  5988
# 4 President 2500107175 25        25001 Bourne     Trump  5026
# 5 President 2500107980 25        25001 Brewster   Biden  4905
# 6 President 2500107980 25        25001 Brewster   Trump  2334
# 7 President 2500112995 25        25001 Chatham    Biden  3043
# 8 President 2500112995 25        25001 Chatham    Trump  1827
# 9 President 2500116775 25        25001 Dennis     Biden  6179
#10 President 2500116775 25        25001 Dennis     Trump  3934  
## MA county names and fips codes

tmp <- county_data %>% 
  as_tibble() %>% 
  select(id, name, state) %>% 
  filter(state == "MA") %>% 
  rename(fips5 = id)

tmp
  
#> tmp
## A tibble: 15 x 3
#   fips5 name              state
#   <chr> <chr>             <fct>
# 1 25000 22                MA   
# 2 25001 Barnstable County MA   
# 3 25003 Berkshire County  MA   
# 4 25005 Bristol County    MA   
# 5 25007 Dukes County      MA   
# 6 25009 Essex County      MA   
# 7 25011 Franklin County   MA   
# 8 25013 Hampden County    MA   
# 9 25015 Hampshire County  MA   
#10 25017 Middlesex County  MA   
#11 25019 Nantucket County  MA   
#12 25021 Norfolk County    MA   
#13 25023 Plymouth County   MA   
#14 25025 Suffolk County    MA   
#15 25027 Worcester County  MA   
  
## Aggregate by county FIPS and merge county names.

results_df %>% 
  filter(race == "President", id != "0", fips_char == "25", mpc == "1") %>% 
  select(race, id, fips_char, fips5,  place, lname, votes) %>% 
  group_by(fips5, lname) %>%  # Group by county fips
  summarize(votes = sum(votes)) %>% 
  left_join(tmp, by = "fips5")
  
# # A tibble: 28 x 5
# # Groups:   fips5 [14]
#    fips5 lname  votes name              state
#    <chr> <chr>  <int> <chr>             <fct>
#  1 25001 Biden  89732 Barnstable County MA   
#  2 25001 Trump  54132 Barnstable County MA   
#  3 25003 Biden  41521 Berkshire County  MA   
#  4 25003 Trump  14015 Berkshire County  MA   
#  5 25005 Biden 150063 Bristol County    MA   
#  6 25005 Trump 118085 Bristol County    MA   
#  7 25007 Biden   9762 Dukes County      MA   
#  8 25007 Trump   2587 Dukes County      MA   
#  9 25009 Biden 259792 Essex County      MA   
# 10 25009 Trump 141135 Essex County      MA   
# # … with 18 more rows  
#   

Columns

  • race: President, Senate, House, Governor
  • id: Variable length character. Codes are as follows:
    • For President, Governor, and Senate Races. ONE OF: (a) "0", if the row refers to results for a whole state. Identify states using fips_char instead. (b) A five-digit county FIPS code if the row refers to results for a county. (c) A ten-digit FIPS location code for results from a township or similar location (the first five characters are this location's county FIPS). Note zero padding.
    • For House races only: A four-digit code consisting of a two-digit State FIPS + two-digit House District. Note zero padding.
    • This column should be parsed as character, not numeric.
  • fips_char: Two digit state FIPS code. Note zero padding. This column should be parsed as character, not numeric.
  • fips5: Five digit FIPS code identifying the county the place is in. Note zero padding. This column should be parsed as character, not numeric.
  • place: State name, or place name. House races are reported by District and have NA for place names. In some states (for example, Vermont), the Presidential, Senate, and Governor results are reported by township or similar location, not county. Thus, (a) place is not county and (b) if you filter out rows where id = 0 (i.e., whole states), the rows you are left with are still are not unique counties To get true county-level results for these races you will have to aggregate vote counts in the rows by fips5.
  • fname: Candidate first name
  • lname: Candidate last name
  • party: Three-letter party code
  • pab: One letter party code
  • votes: N votes
  • incumbent: 1 = is incumbent, 0 otherwise
  • mpc: 1 if candidate is a main party candidate (Rep or Dem), 0 otherwise

FIPS codes

The state-level FIPS codes, extracted from a page at the NRCS, are in fips.csv. The columns are as follows:

  • state: The state or region, i.e. 'Alabama'
  • abbr: The postal code, i.e. 'AL'
  • fips_char: The abbreviation, i.e. '01'; note zero padding.

us_elections_2020_csv's People

Contributors

andrewpbray avatar ftrain avatar kjhealy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

us_elections_2020_csv's Issues

Presidential vote by congressional district

Thanks for this amazing resource. Thanks! Do you have any plans to add presidential votes by congressional district? (Happy to try to submit a PR if the code is somewhere.)

Four bad rows

There are four bad rows in the data.

library(tidyverse)

read_csv("https://raw.githubusercontent.com/kjhealy/us_elections_2020_csv/master/results_x2020_11_07_15_18_17.csv") %>% 
  filter(id %in% c("Iowa", "Ohio", "Utah")) %>% 
  select(1:4)
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   race = col_character(),
#>   id = col_character(),
#>   fips_char = col_character(),
#>   fips5 = col_character(),
#>   place = col_character(),
#>   fname = col_character(),
#>   lname = col_character(),
#>   party = col_character(),
#>   pab = col_character(),
#>   votes = col_double(),
#>   incumbent = col_double(),
#>   mpc = col_double()
#> )
#> # A tibble: 4 x 4
#>   race     id    fips_char fips5
#>   <chr>    <chr> <chr>     <chr>
#> 1 Governor Iowa  19        <NA> 
#> 2 Senate   Ohio  39        <NA> 
#> 3 Governor Ohio  39        <NA> 
#> 4 Senate   Utah  49        <NA>

Created on 2020-11-07 by the reprex package (v0.3.0)

There should not be a state name in the id column. Also, at least some of these elections did not happen this year. That is, there was no election for governor in Iowa, for example.

Apologies for not making this report more clear in my previous submission.

What is the data source?

Hi. Many Thanks for creating and sharing this file!! What is the source of this info? I am assuming it is collected from one of the major news organizations...but can't be certain. If I want to cite this data on a blog post (or some other informal venue), how do I do it?

minor data problems

I don't think id should be a state name like Iowa.

library(tidyverse)

read_csv("https://raw.githubusercontent.com/kjhealy/us_elections_2020_csv/master/results_x2020_11_07_08_13_28.csv", 
              col_types = cols(race = col_character(),
                               fips_char = col_character(),
                               place = col_character(),
                               id = col_character(),
                               fname = col_character(),
                               lname = col_character(),
                               party = col_character(),
                               pab = col_character(),
                               votes = col_double(),
                               incumbent = col_double(),
                               mpc = col_double())) %>% 
    filter(race != "House", nchar(id) == 4)
#> # A tibble: 4 x 12
#>   race  id    fips_char fips5 place fname lname party pab   votes incumbent
#>   <chr> <chr> <chr>     <chr> <chr> <chr> <chr> <chr> <chr> <dbl>     <dbl>
#> 1 Gove… Iowa  19        <NA>  <NA>  <NA>  <NA>  <NA>  <NA>     NA        NA
#> 2 Sena… Ohio  39        <NA>  <NA>  <NA>  <NA>  <NA>  <NA>     NA        NA
#> 3 Gove… Ohio  39        <NA>  <NA>  <NA>  <NA>  <NA>  <NA>     NA        NA
#> 4 Sena… Utah  49        <NA>  <NA>  <NA>  <NA>  <NA>  <NA>     NA        NA
#> # … with 1 more variable: mpc <dbl>

Created on 2020-11-07 by the reprex package (v0.3.0)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.