congress
is a package designed to allow a user with only basic
knowledge of R interact with CongressData, a dataset with nearly 800
variables that compiles information about all US congressional districts
across 1789-2021. Specifically, the dataset tracks characteristics of
congressional districts, the members of congress themselves, and the
behavior of those members in policymaking. Users can find variables
related to demographics, politics, and policy; subset the data across
multiple dimensions; create custom aggregations of the dataset; and
access citations in both plain text and BibTeX for every variable. An
associated web application is available
here and the data-only package
is here.
The CongressData codebook is available here.
The package’s manual contains information regarding each function and its arguments. It is available here: congress manual.
congress
is a functional package that interacts with CongressData. We
maintain the dataset in another package called congressData
. You can
use the data-only package if you simply want to access the data. Install
them from GitHub like so:
# use the devtools library to download the package from GitHub
library(devtools)
# this will download congressData as well (NOTE: installation can take several minutes)
install_github("ippsr/congress")
# if there are issues or you only want to download congressData (NOTE: installation can take several minutes)
install_github("ippsr/congressData")
get_var_info
: Retrieve information regarding variables in CongressData
and identify variables of interest with get_var_info
. The function
allows you to search to codebook to find the years each variable is
observed in the data; a short and long description of each variable; and
the source and citation/s for each variable. Citations are available in
both bibtex and plain text. Use the function to search for broad terms
like ‘tax’ with the related_to
argument and/or partial-match variable
names with var_names
.
suppressMessages(library(dplyr))
library(congress)
#> Please cite:
#> Grossmann, M., Lucas, C., McCrain, J, & Ostrander, I. (2022). CongressData.
#> East Lansing, MI: Institute for Public Policy and Social Research (IPPSR).
#>
#> You are using the version of the Congress Data stored in your local copy of congressData. Running `congressData::get_congress_version()` will print your local version number.
# variables related to health insurance
h_ins_cong <- get_var_info(related_to = "health insurance")
cat("There are",nrow(h_ins_cong),"variables related to health insurance in CongressData")
#> There are 41 variables related to health insurance in CongressData
head(h_ins_cong$variable)
#> [1] "percent_under18_healthins" "percent_private_under18"
#> [3] "percent_public_under18" "percent_privpub_under18"
#> [5] "percent_pop18_34" "percent_private_18_34"
# variables with 'under18' in their name
under18_cong <- get_var_info(var_names = "under18")
head(under18_cong$variable)
#> [1] "percent_under18" "percent_under18_healthins"
#> [3] "percent_private_under18" "percent_public_under18"
#> [5] "percent_privpub_under18" "under18"
get_var_info
returns the following information to simplify using
CongressData:
- variable: Variable name
- year: The precise years the variable is observed
- short_desc: A short description of the variable
- long_desc: A long description of the variable
- source: The sources of the data
- category: the variable’s category (not all are coded)
- plaintext_cite (1-4): Plain text citations for the data
- bibtext_cite (1-4): BibTeX citation for the data
get_cong_data
: Access all or a part of CongressData with
get_cong_data
. Subset by state names with state
and years with
years
(either a single year or a two-year vector that represents the
min/max of what you want). You can also use the related_to
argument to
search across variable names, short/long descriptions from the codebook,
and citations for non-exact matches of a supplied term. For example,
searching ‘tax’ will return variables with words like ‘taxes’ and
‘taxable’ in any of those columns.
# return the entire dataset
all_the_dat <- get_cong_data()
# subset by state, topic, and years
cong_subset <- get_cong_data(states = c("Kentucky","Michigan","Pennsylvania")
,related_to = "tax"
,years = c(1960,1980)
)
aggregate_cong_dat
: Choose how to aggregate the member-year data into
member-session data across subsets (e.g. data sources) of CongressData.
You can choose either Mean
or Sum
or First
(meaning the value in
the first year of the session) to aggregate the following chunks of the
dataset:
- census_nonperc_vars: Non percent Census Variables
- census_perc_vars Percent Census Variables
- bill_vars: Congressional Bills Project Variables
- com_vars: Committee Assignment Variables
- else_vars: All the Other Variables
The variable names in the resulting dataset will reflect how they were
aggregated (e.g. nbills_major_topic_10
becomes
nbills_major_topic_10_mean
). Note: while character variables will
reflect the chosen aggregation scheme (e.g. character_var_sum
), they
are aggregated by pasting their unique values together.
# import the data using the default mean values
agg_cong <- aggregate_cong_dat()
# choose specific aggregation methods by subgroup
cong_custom <- aggregate_cong_dat(census_nonperc_vars = "Mean"
,census_perc_vars = "Sum"
,bill_vars = "First"
,com_vars = "Mean"
,else_vars = "Sum")
# default aggs for specific states/sessions and vars related to health
cong_subset <- aggregate_cong_dat(states = c("Kentucky","Michigan","Pennsylvania")
,related_to = "health"
,sessions = c(50:55)
)
get_var_info
: Each variable in CongressData was collected from
external sources, please use get_var_info
to obtain their citations
(plain text and BibTeX). We’ve made it easy to cite the source of each
variable you use with the get_var_info
function described above.
Supply a vector of variable names to the function with the var_names
function and collect the citations provided in the plain text or BibTeX
columns. NOTE: Some variables have multiple citations, so do check you
have them all.
# bibtex is also available
get_var_info(var_names = "com_benghazi_299") %>%
pull(plaintext_cite)
#> [1] "Charles Stewart III and Jonathan Woon. Congressional Committee Assignments, 103rd to 114th Congresses, 1993--2017: House of Representatives, 2017.\n"
# bibtex is also available
get_var_info(var_names = "percent_bus") %>%
pull(plaintext_cite)
#> [1] "U.S. Census Bureau. (2022). 2009-2019 American Community Survey 1-year Estimates. Retrieved from the Census Bureau Data API."
For questions about congressData please reach out to Benjamin Yoel ([email protected]).
In addition to citing each variable’s source, we ask that you cite CongressData if use this package or the dataset. A recommended citation is below for congressData or for the specific version of the datase.
Grossmann, M., Lucas, C., McCrain, J, & Ostrander, I. (2022). CongressData. East Lansing, MI: Institute for Public Policy and Social Research (IPPSR)
Rapanos, A., & Yoel, B. (2024). CongressData v1.4. East Lansing, MI: Institute for Public Policy and Social Research (IPPSR)