Skip to content

Commit

Permalink
added ACS 2016-2020 5-year data, too big for CRAN, not submitted
Browse files Browse the repository at this point in the history
  • Loading branch information
GL-Li committed Mar 19, 2022
1 parent 6f0cb5e commit f1f68e5
Show file tree
Hide file tree
Showing 37 changed files with 32,762 additions and 1,026 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ README.*
^\.travis\.yml$
test
update_notes
deleted_files
6 changes: 3 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
Package: totalcensus
Type: Package
Title: Extract Decennial Census and American Community Survey Data
Version: 0.6.5
Version: 0.6.6
Author: Guanglai Li
Maintainer: Guanglai Li <[email protected]>
Date: 2020-12-10
Date: 2022-03-19
Description: Download summary files from Census Bureau <https://www2.census.gov/>
and extract data, in particular high resolution data at
block, block group, and tract level, from decennial census and
Expand All @@ -27,4 +27,4 @@ Suggests:
rmarkdown,
ggmap,
ggplot2
RoxygenNote: 7.1.1
RoxygenNote: 7.1.2
28 changes: 28 additions & 0 deletions R/data_acs.R
Original file line number Diff line number Diff line change
@@ -1,3 +1,30 @@
# lookup_acs5year_2020 =========================================================
#' ACS 5-year 2020 file segment and table lookup data
#'
#' @docType data
#'
#' @format A data.table with 27850 rows and 7 variables
#' \describe{
#' \item{file_segment}{sequence number of segment data files, from "0001" to "0122"}
#' \item{table_content}{description of columns in a table}
#' \item{reference}{reference of the table content, such as "B01001_002". The reference
#' is used to extract data of table content.}
#' \item{restriction}{restrictions applied the the table_content}
#' \item{table_number}{table number such as "B01001"}
#' \item{table_name}{description of table. A table has multiple columns (table_content)}
#' \item{universe}{the universe of the data}
#' }
#'
#' @keywords datasets
#'
#' @source Check for each year of ACS 1-year and 5-year
#' \href{https://www.census.gov/programs-surveys/acs/data/summary-file.html}{Sequence Number/Table Number Lookup File}.
#'

"lookup_acs5year_2020"



# lookup_acs5year_2019 =========================================================
#' ACS 5-year 2019 file segment and table lookup data
#'
Expand Down Expand Up @@ -915,6 +942,7 @@
#' \describe{
#' \item{table_number}{table number such as "C27013"}
#' \item{table_name}{description of the table}
#' \item{acs5_2020}{whether the table is available in 2020}
#' \item{acs5_2019}{whether the table is available in 2019}
#' \item{acs5_2018}{whether the table is available in 2018}
#' \item{acs5_2017}{whether the table is available in 2017}
Expand Down
77 changes: 75 additions & 2 deletions R/download_census.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
#' United States Census bureau. It also download
#' generated data from Census 2010 if not exist.
#'
#' @param survey Which survey to download from, "decennial", "acs5year", or "acs1year"
#' @param survey Which survey to download from, "decennial", "acs5year", ,
#' "acs1year", or "redistricting".
#' @param year year or ending year of the survey
#' @param states vector of abbreviations of states such as c("MA", "RI")
#'
Expand All @@ -30,8 +31,10 @@ download_census <- function(survey, year, states = c(states_DC, "US", "PR")){
download_acs5year_(year, states)
} else if (survey == "acs1"){
download_acs1year_(year, states)
} else if (survey == "redistricting") {
download_redistricting_(year, states)
} else {
message('Please select a survey from "dec" (or "decennial"), "acs5", or "acs1".')
message('Please select a survey from "dec" (or "decennial"), "acs5", "acs1", or "redistricting".')
}

options(timeout = 60)
Expand Down Expand Up @@ -566,6 +569,74 @@ download_acs1year_1_state_ <- function(year, state){
}


download_redistricting_ <- function(year, states){

i <- 0
N <- length(states)
for (st in states){
i <- i + 1
cat(paste0("Downloading ", i, " of ", N, " states.\n"))
cat(paste0("Downloading ", st, " summary files of Census ", year, ".\n"))
download_redistricting_1_state_(year, st)
}
}



download_redistricting_1_state_ <- function(year, state){
# download census 2020 redistricting data from:
# https://www2.census.gov/programs-surveys/decennial/2020/data/01-Redistricting_File--PL_94-171/
#
# Args_____
# year: census year
# state : abbreviation of the state

state <- toupper(state)

# temp folder to hold all downloaded data
path_to_census <- Sys.getenv("PATH_TO_CENSUS")


# construct right names for url
state_names <- dict_fips[, .(full = state_full, abbr = state_abbr)] %>%
unique() %>%
.[, full := str_replace_all(full, " ", "_")] %>%
# the US data is named as "National" in the download sites
.[abbr == "US", full := "National"]

full <- state_names[abbr == state, full]

if (year == 2020){
url <- paste0(
"https://www2.census.gov/programs-surveys/decennial/2020/data/01-Redistricting_File--PL_94-171/",
full, "/", tolower(state), "2020.pl.zip"
)

save_as <- paste0(path_to_census, "/", tolower(state), ".zip")
download.file(url, save_as, method = "auto")

# unzip downloaded file
cat(paste0("Unzipping downloaded zip file of ", state, "\n"))
unzip(save_as, exdir = paste0(path_to_census, "/redistricting", year, "/", state))
cat("File unzipped successfully\n")

# delete downloaded file to save space
file.remove(save_as)
cat("Deleted downloaded zip file\n\n")

} else if (year == 2030){
# waiting for 10 years
}

}










convert_geo_txt2csv_acs1year_ <- function(txt_file, year){
Expand Down Expand Up @@ -607,3 +678,5 @@ convert_geo_txt2csv_acs1year_ <- function(txt_file, year){
return(geo)
}



4 changes: 3 additions & 1 deletion R/quiet_global_variable_in_cran_check.R
Original file line number Diff line number Diff line change
Expand Up @@ -50,5 +50,7 @@ utils::globalVariables(unique(c(
"content_acs1_2018", "name_acs1_2018", "universe_acs1_2018",
"content_acs5_2018", "name_acs5_2018", "universe_acs5_2018",
"content_acs1_2019", "name_acs1_2019", "universe_acs1_2019", "table_content_acs1year_all_years",
"content_acs5_2019", "name_acs5_2019", "universe_acs5_2019"
"content_acs5_2019", "name_acs5_2019", "universe_acs5_2019",
"content_acs5_2020", "name_acs5_2020", "universe_acs5_2020",
"dict_redistricting_geoheader_2020"
)))
9 changes: 7 additions & 2 deletions R/search_census_and_acs.R
Original file line number Diff line number Diff line change
Expand Up @@ -707,6 +707,7 @@ generate_acs5_tablecontents_ <- function(){
acs5_2017 <- modify_lookup_table_(5, 2017)
acs5_2018 <- modify_lookup_table_(5, 2018)
acs5_2019 <- modify_lookup_table_(5, 2019)
acs5_2020 <- modify_lookup_table_(5, 2020)

dict_acs_tablecontent <- reduce(list(acs5_2009,
acs5_2010,
Expand All @@ -718,7 +719,8 @@ generate_acs5_tablecontents_ <- function(){
acs5_2016,
acs5_2017,
acs5_2018,
acs5_2019),
acs5_2019,
acs5_2020),
merge, by = "reference", all = TRUE) %>%

# add the following lines for year since 2013
Expand All @@ -743,10 +745,13 @@ generate_acs5_tablecontents_ <- function(){
.[!is.na(acs5_2019), ":=" (table_content = content_acs5_2019,
table_name = name_acs5_2019,
universe = universe_acs5_2019)] %>%
.[!is.na(acs5_2020), ":=" (table_content = content_acs5_2020,
table_name = name_acs5_2020,
universe = universe_acs5_2020)] %>%

# include all years and surveys
.[, .(reference, table_content, table_name,
acs5_2019, acs5_2018, acs5_2017, acs5_2016, acs5_2015,
acs5_2019, acs5_2019, acs5_2018, acs5_2017, acs5_2016, acs5_2015,
acs5_2014, acs5_2013, acs5_2012, acs5_2011,
acs5_2010, acs5_2009,
universe)]
Expand Down
2 changes: 2 additions & 0 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ Download summary files from [Census Bureau](https://www2.census.gov/) and extrac


## Update
**03/19/2022**: ACS 2016-2020 5-year data is available in the development version.

**12/10/2020**: Version 0.6.5 is on CRAN. The 2019 ACS 5 year data was added to the package. The package now includes all latest data since 2000:

- Decennial census 2000 and 2010
Expand Down
47 changes: 22 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
<!-- README.md is generated from README.Rmd. Please edit that file -->

[![Build
Status](https://travis-ci.org/GL-Li/totalcensus.svg?branch=master)](https://travis-ci.org/GL-Li/totalcensus)
![](http://www.r-pkg.org/badges/version/totalcensus)
![](https://cranlogs.r-pkg.org/badges/grand-total/totalcensus)
![](https://cranlogs.r-pkg.org/badges/totalcensus)

Extract Decennial Census and American Community Survey Data
===========================================================
# Extract Decennial Census and American Community Survey Data

Download summary files from [Census Bureau](https://www2.census.gov/)
and extract data from the summary files.

Update
------
## Update

**03/19/2022**: ACS 2016-2020 5-year data is available in the
development version.

**12/10/2020**: Version 0.6.5 is on CRAN. The 2019 ACS 5 year data was
added to the package. The package now includes all latest data since
Expand All @@ -22,8 +24,7 @@ added to the package. The package now includes all latest data since
- ACS 1 year: 2005 - 2019
- ACS 5 year: 2009 - 2019

Installation and setup
----------------------
## Installation and setup

### Installation

Expand All @@ -49,8 +50,7 @@ library(totalcensus)
set_path_to_census("xxxxx/my_census_data")
```

Introduction
------------
## Introduction

This package extract data directly from summary files of Decennial
Censuses and American Community Surveys (ACS). The summary files store
Expand Down Expand Up @@ -116,8 +116,7 @@ There are additional benefits of using this package:
entities](https://gl-li.netlify.com/2017/12/28/use-totalcensus-package-to-determine-relationship-between-geographic-entities/);
an application example.

How to use the package
----------------------
## How to use the package

### `read_xxxx()` functions

Expand All @@ -133,33 +132,33 @@ The function arguments serve as filters to select the data you want:
- states: the states of which you want read geography and data files.
In addition to 50 states and “DC”, you can choose from “PR” (Puerto
Rico), plus a special one “US” for national files.
- table\_contents: this parameter specifies which table contents you
want to read. Population is always returned even if table\_contents
- table_contents: this parameter specifies which table contents you
want to read. Population is always returned even if table_contents
is NULL. Users can name the table contents in the format such as
`c("male = B01001_002", "female = B01001_026")`.
- areas: if you know which metropolitan areas, counties, cities and
towns you want to get data from, you can specify them here by name
or FIPS code, for example,
`c("New York metro", "PLACE = UT62360", "Salt Lake City city, UT")`.
- geo\_headers: In case you do not know which areas to extract data,
- geo_headers: In case you do not know which areas to extract data,
you can read all the geographic headers specified here and select
areas after reading.
- summary\_level: it determines which summary level data to extract.
- summary_level: it determines which summary level data to extract.
Common ones like “state”, “county”, “place”, “county subdivision”,
“tract”, “block group”, and “block” can be input as plain text.
Others have to be given by code.
- geo\_comp: specifies data of which geographic component you want.
- geo_comp: specifies data of which geographic component you want.
Most common ones are “total”, “urban”, “urbanized area”, “urban
cluster”, and “rural”. Others are provided by code.

Functions `read_acs1year()` and `read_acs5year()` have additional
argument:

- with\_margin: whether to read margin of error of the estimate.
- dec\_fill: whether to fill geo\_headers codes with data from
decennial census. The codes in ACS summary file are often
incomplete. To use decennial census 2010 data to fill the missing
values, set the argument to “dec2010”.
- with_margin: whether to read margin of error of the estimate.
- dec_fill: whether to fill geo_headers codes with data from decennial
census. The codes in ACS summary file are often incomplete. To use
decennial census 2010 data to fill the missing values, set the
argument to “dec2010”.

### `search_xxxx()` functions

Expand All @@ -170,8 +169,7 @@ codes.
The following examples demonstrate how to use these `read_xxx()` and
`search_xxx()` functions.

Examples
--------
## Examples

### Median gross rent in cities with population over 65000

Expand All @@ -194,7 +192,7 @@ RStudio. You can provide keywords to search in the function but it is
better to do the search in RStudio with filters. There are so many
tables that contains string “rent”. It takes some time to find the right
one if you are not familiar with ACS tables. After some struggle, we
think B25064\_001 is what we want.
think B25064_001 is what we want.

We do not need to specify `areas` and `geo_headers` as we are extracting
all geographic areas matches the conditions.
Expand Down Expand Up @@ -329,8 +327,7 @@ ggmap(south_bend) +

![](figures/south_bend_block_black.png)

Downloading data
----------------
## Downloading data

This package requires downloading census data to your local computer.
You will be asked to download data when you call `read_xxxx` functions.
Expand Down
Binary file modified data/dict_acs5_table.RData
Binary file not shown.
Binary file added data/lookup_acs5year_2020.RData
Binary file not shown.
Binary file added data_raw/acs/ACS_2020_SF_5YR_Appendices.xlsx
Binary file not shown.
Loading

0 comments on commit f1f68e5

Please sign in to comment.