Shiro Kuriwaki
This repository is R code to build the Cooperative Congressional Election Study (CCES) cumulative file (2006 - 2022).
Please feel free to file any questions or requests about the cumulative file as Github issues.
Start by downloading either the .dta
, .Rds
, or .feather
file on
the dataverse
page
to your computer. This repository does not track the data due to size
constraints, but feel free to contact me if you need the newest version
not on Dataverse. The .Rds
format can be read into R.
dat <- readRDS("cumulative_2006-2022.Rds")
Make sure to load the tidyverse
package first. The Rds file can be
dealt with as a base-R data.frame, but it was built completely in the
tidyverse
environment so using it as a tibble
gives full features.
library(tidyverse)
dat
## # A tibble: 617,455 × 103
## year case_id weight weight_cumulative state st cong cong_up
## * <int> <chr> <dbl> <dbl> <int+lbl> <int+l> <fct> <fct>
## 1 2006 439219 1.85 1.67 37 [North Carol… 37 [NC] 109 110
## 2 2006 439224 0.968 0.872 39 [Ohio] 39 [OH] 109 110
## 3 2006 439228 1.59 1.44 34 [New Jersey] 34 [NJ] 109 110
## 4 2006 439237 1.40 1.26 17 [Illinois] 17 [IL] 109 110
## 5 2006 439238 0.903 0.813 36 [New York] 36 [NY] 109 110
## 6 2006 439242 0.839 0.756 48 [Texas] 48 [TX] 109 110
## 7 2006 439251 0.777 0.700 27 [Minnesota] 27 [MN] 109 110
## 8 2006 439254 0.839 0.756 32 [Nevada] 32 [NV] 109 110
## 9 2006 439255 0.331 0.299 48 [Texas] 48 [TX] 109 110
## 10 2006 439263 1.10 0.993 24 [Maryland] 24 [MD] 109 110
## # ℹ 617,445 more rows
## # ℹ 95 more variables: state_post <int+lbl>, st_post <int+lbl>, dist <int>,
## # dist_up <int>, cd <chr>, cd_up <chr>, dist_post <int>, dist_up_post <int>,
## # cd_post <chr>, cd_up_post <chr>, zipcode <chr>, county_fips <chr>,
## # tookpost <int+lbl>, weight_post <dbl>, rvweight <dbl>, rvweight_post <dbl>,
## # starttime <dttm>, pid3 <int+lbl>, pid3_leaner <int+lbl>, pid7 <int+lbl>,
## # ideo5 <fct>, gender <int+lbl>, sex <int+lbl>, gender4 <int+lbl>, …
A Stata .dta
can also be read in by Stata, or in R through
haven::read_dta()
. You will need the haven
package loaded.
The arrow files can be loaded with arrow::read_feather()
. They are
currently modeled so that it would give the same output as reading the
dta file.
Each row is a respondent, and each variable is information associated with that respondent. Note that this cumulative dataset extracts only a couple of key variables from each year’s CCES, which has hundreds of columns.
Most variables in this dataset come straight from each year’s CCES. However, it renames and standardizes variable names, making them accessible in one place. Please see the guide or the Crunch dataset for a full list and description of these variables.
The cumulative file has added candidate name and identifiers that a
respondent chose. In the original year-specific datasets, the response
values for a vote choice question is usually a generic label,
e.g. Candidate1
and Candidate2
(with separate look-up variables
appended). The cumulative dataset shows both the generic label and the
chosen candidate’s name, party, and identifier, which will vary across
individuals.
select(dat, year, case_id, matches("voted_sen"))
## # A tibble: 617,455 × 5
## year case_id voted_sen voted_sen_party voted_sen_chosen
## <int> <chr> <fct> <fct> <chr>
## 1 2006 439219 <NA> <NA> <NA>
## 2 2006 439224 [Democrat / Candidate 1] Democratic Sherrod C. Brown (…
## 3 2006 439228 [Democrat / Candidate 1] Democratic Robert Menendez (D)
## 4 2006 439237 <NA> <NA> <NA>
## 5 2006 439238 [Democrat / Candidate 1] Democratic Hillary Rodham Cli…
## 6 2006 439242 I Did Not Vote In This Race <NA> <NA>
## 7 2006 439251 [Republican / Candidate 2] Republican Mark Kennedy (R)
## 8 2006 439254 [Democrat / Candidate 1] Democratic Jack Carter (D)
## 9 2006 439255 [Democrat / Candidate 1] Democratic Barbara Ann Radnof…
## 10 2006 439263 I Did Not Vote In This Race <NA> <NA>
## # ℹ 617,445 more rows
A version of the dataset is also included in Crunch, a database platform that makes it easy to view and analyze survey data either with our without any programming experience. For access to View the dataset (free), please sign up here: https://harvard.az1.qualtrics.com/jfe/form/SV_066hQi4Eeco3Kap. Some features include:
For questions and more access, please contact the CCES Team.
Crunch datasets can also be manipulated from a R package, crunch
:
https://github.com/Crunch-io/rcrunch.
install.packages("crunch")
For a bit more on using the R crunch package for your own purposes, see the crunch package vignettes, pkgdown website, or a short vignette in this repo.
R scripts 01
- 07
reproduce the cumulative dataset starting from
each year’s CCES on dataverse.
01_define-names-labels.R
constructs two variable name tables – one that names and describes each variable to be in the final dataset, and another that indicates which variables corresponds to the candidate columns in each year’s CCES.02_download-cces-dataverse.R
indicates a (partial) way to download the component CCES data from dataverse so that the rest of the code can be run.03_read-common.R
pulls out the common contents with minimal formatting (e.g. state, case identifier variable names)04_prepare-fixes.R
makes some fixes to variables in each year’s datasets.05_stack-cumulative.R
pulls out the variables of interest from annual CCES files, we stack this into a long dataset where each row is a respondent from CCES.06_extract_politicians.R
pulls out the “contextual variables” at the respondent-level. information on candidates and representatives. It uses some long format voting tables from05
.07_merge-contextual_upload.R
combines all the variables together, essentially combining the output of04
on05
. Saves a.Rds
andsav
version.08_format-crunch.R
logs into Crunch, and adds variable names, descriptions, groupings, and other Crunch attributes to the Crunch dataset. It also adds variables and exports a.dta
version
More scripts are in 00_prepare
, they format other datasets like
NOMINATE, CQ, and DIME.