From a28c46e20a41b5c3980ab488867672b4896bc512 Mon Sep 17 00:00:00 2001 From: GL-Li Date: Fri, 3 Jul 2020 08:25:20 -0400 Subject: [PATCH] update README --- README.Rmd | 33 +++++++------- README.md | 64 ++++++++++++--------------- why_this_package/why_this_package.Rmd | 56 ++++++++++++++++++++--- 3 files changed, 94 insertions(+), 59 deletions(-) diff --git a/README.Rmd b/README.Rmd index 02de9bb..4d1babb 100644 --- a/README.Rmd +++ b/README.Rmd @@ -21,11 +21,11 @@ knitr::opts_chunk$set( # Extract Decennial Census and American Community Survey Data -Download summary files from [Census Bureau](https://www2.census.gov/) and extract data of decennial censuses and American Community Surveys from your local computer. +Download summary files from [Census Bureau](https://www2.census.gov/) and extract data from the summary files. ## Update -**1/8/2020**: Version 0.6.3 is on CRAN. The 2018 ACS 5 year data was added to the package. The package now includes: +**1/8/2020**: Version 0.6.3 is on CRAN. The 2018 ACS 5 year data was added to the package. The package now includes all latest data since 2000: - Decennial census 2000 and 2010 - ACS 1 year: 2005 - 2018 @@ -54,20 +54,18 @@ set_path_to_census("xxxxx/my_census_data") -## Why another R census package +## Introduction -The [census API](https://www.census.gov/data/developers/guidance/api-user-guide.Available_Data.html) offers most data in decennial censuses and ACS estimates for download and API-based packages such as `tidycensus`, `censusapi` and `acs` make the downloading very convenient in R. So why we need another package? +This package extract data directly from summary files of Decennial Censuses and American Community Surveys (ACS). The summary files store the summary data compiled directly from the original survey questionnaires filled out by each household. They are the most comprehensive datasets available to the public. By directly accessing the summary files, we are able to extract any data offered by Decennial Census and ACS. -One advantage is that once you downloaded the summary files, you do not need internet anymore and everything is on your own computer. You do not need to worry about internet interruption or government shutdown. You have total control of the data. +By downloading summary file to your computer, it is particularly fast and convenient to extract high resolution data at census tract, block group, and block level for a large area. -Another benefit of using package `totalcensus` is that it makes census data extraction more flexible. It is particularly convenient to extract high resolution data at census tract, block group, and block level for a large area. - -Here is an example of how we extract the median home values in **all** block groups in the United States from 2011-2015 ACS 5-year survey with this package. You simply need to call the function `read_acs5year()`. It takes 15 seconds for my 4-years old laptop to return the data of all 217,739 block groups. In addition to the table contents we request, we also get the population and coordinate of each block group. +Here is an example of how we extract the median home values in **all** block groups in the United States from 2011-2015 ACS 5-year survey with this package. You simply need to call the function `read_acs5year()`. It takes 15 seconds for my 7-years old laptop to return the data of all 217,739 block groups. In addition to the table contents we request, we also get the population and coordinate of each block group. ```{r eval = FALSE} library(totalcensus) home_national <- read_acs5year( - year = 2015, + year = 2018, states = states_DC, # all 50 states plus DC table_contents = "home_value = B25077_001", summary_level = "block group" @@ -81,7 +79,7 @@ With the coordinates, we can visualize the data on US map with `ggplot2` and `gg There are additional benefits of using this package: - You can get detailed urban/rural data from Census 2010. This package use summary file 1 with urban/rural update, while the census API only provide data in summary file 1 before urban/rural update. -- You can get all block groups that belong or partially belong to a city. Original census data do not provide city information for a block group as a block group may not uniquely belong to a city. However, large cities have most block groups within their boundaries and only a small number of block groups run across the borders. The block group level data provide valuable spatial information of a city. This is particularly helpful for ACS 5-year surveys which cover data down to the level of block groups. +- You can get all block groups that belong or partially belong to a city. Original census data do not provide city information for a block group as a block group may not exclusively belong to a city. - It provides longitude and latitude of the internal point of a geographic area for easy and quick mapping. You do not always need shape files to make nice maps, as in the map shown above. @@ -93,8 +91,8 @@ There are additional benefits of using this package: -## Basic application -### the `read_xxxx()` functions +## How to use the package +### `read_xxxx()` functions The package has three functions to read decennial census, ACS 5-year survey, and ACS 1-year survey: `read_decennial()`, `read_acs5year()`, and `read_acs1year()`. They are similar but as these datasets are so different, we prefer to keep three separate functions, one for each. The function arguments serve as filters to select the data you want: @@ -112,29 +110,30 @@ Functions `read_acs1year()` and `read_acs5year()` have additional argument: - with_margin: whether to read margin of error of the estimate. - dec_fill: whether to fill geo_headers codes with data from decennial census. The codes in ACS summary file are often incomplete. To use decennial census 2010 data to fill the missing values, set the argument to "dec2010". -### the `search_xxxx()` functions +### `search_xxxx()` functions There are a family of `search_xxx()` functions to help find table contents, geoheaders, summary levels, geocomponents, FIPS codes and CBSA codes. The following examples demonstrate how to use these `read_xxx()` and `search_xxx()` functions. +## Examples ### Median gross rent in cities with population over 65000 A property management company wants to know the most recent rents in major cities in the US. How to get the data? -We first need to determine which survey to read. For most recent survey data, we want to read 2016 ACS 1-year estimates, which provide data for geographic areas with population over 65000. +We first need to determine which survey to read. For most recent survey data, we want to read 2018 ACS 1-year estimates, which provide data for geographic areas with population over 65000. We also need to determine which data files to read. We know summary level of cities is "160" or "place". Browsing with `search_summarylevels("acs1")`, we see that this summary level is only in state files of ACS 1-year estimates. So we will read all the state files. -Then we need to check if 2016 ACS 1-year estimate has the rent data. We run `search_tablecontents("acs1")` to open the dataset with `View()` in RStudio. You can provide keywords to search in the function but it is better to do the search in RStudio with filters. There are so many tables that contains string "rent". It takes some time to find the right one if you are not familiar with ACS tables. After some struggle, we think B25064_001 is what we want. +Then we need to check if 2018 ACS 1-year estimate has the rent data. We run `search_tablecontents("acs1")` to open the dataset with `View()` in RStudio. You can provide keywords to search in the function but it is better to do the search in RStudio with filters. There are so many tables that contains string "rent". It takes some time to find the right one if you are not familiar with ACS tables. After some struggle, we think B25064_001 is what we want. We do not need to specify `areas` and `geo_headers` as we are extracting all geographic areas matches the conditions. -Below is the code that gives what we want. The first time you use `read_xxxx()` functions to read data files, you will be asked to download data generated from decennial census 2010 and summary files required for this function call, in this case, 2016 ACS 1-year summary files. Choose 1 to continue. +Below is the code that gives what we want. The first time you use `read_xxxx()` functions to read data files, you will be asked to download data generated from decennial census 2010 and summary files required for this function call, in this case, 2018 ACS 1-year summary files. Choose 1 to continue. ```{r eval = FALSE} rent <- read_acs1year( - year = 2016, + year = 2018, states = states_DC, table_contents = "rent = B25064_001", summary_level = "place" diff --git a/README.md b/README.md index 2f52855..69996b0 100644 --- a/README.md +++ b/README.md @@ -9,14 +9,14 @@ Extract Decennial Census and American Community Survey Data =========================================================== Download summary files from [Census Bureau](https://www2.census.gov/) -and extract data of decennial censuses and American Community Surveys -from your local computer. +and extract data from the summary files. Update ------ **1/8/2020**: Version 0.6.3 is on CRAN. The 2018 ACS 5 year data was -added to the package. The package now includes: +added to the package. The package now includes all latest data since +2000: - Decennial census 2000 and 2010 - ACS 1 year: 2005 - 2018 @@ -48,36 +48,32 @@ library(totalcensus) set_path_to_census("xxxxx/my_census_data") ``` -Why another R census package ----------------------------- +Introduction +------------ -The [census -API](https://www.census.gov/data/developers/guidance/api-user-guide.Available_Data.html) -offers most data in decennial censuses and ACS estimates for download -and API-based packages such as `tidycensus`, `censusapi` and `acs` make -the downloading very convenient in R. So why we need another package? +This package extract data directly from summary files of Decennial +Censuses and American Community Surveys (ACS). The summary files store +the summary data compiled directly from the original survey +questionnaires filled out by each household. They are the most +comprehensive datasets available to the public. By directly accessing +the summary files, we are able to extract any data offered by Decennial +Census and ACS. -One advantage is that once you downloaded the summary files, you do not -need internet anymore and everything is on your own computer. You do not -need to worry about internet interruption or government shutdown. You -have total control of the data. - -Another benefit of using package `totalcensus` is that it makes census -data extraction more flexible. It is particularly convenient to extract -high resolution data at census tract, block group, and block level for a -large area. +By downloading summary file to your computer, it is particularly fast +and convenient to extract high resolution data at census tract, block +group, and block level for a large area. Here is an example of how we extract the median home values in **all** block groups in the United States from 2011-2015 ACS 5-year survey with this package. You simply need to call the function `read_acs5year()`. It -takes 15 seconds for my 4-years old laptop to return the data of all +takes 15 seconds for my 7-years old laptop to return the data of all 217,739 block groups. In addition to the table contents we request, we also get the population and coordinate of each block group. ``` r library(totalcensus) home_national <- read_acs5year( - year = 2015, + year = 2018, states = states_DC, # all 50 states plus DC table_contents = "home_value = B25077_001", summary_level = "block group" @@ -99,12 +95,7 @@ There are additional benefits of using this package: only provide data in summary file 1 before urban/rural update. - You can get all block groups that belong or partially belong to a city. Original census data do not provide city information for a - block group as a block group may not uniquely belong to a city. - However, large cities have most block groups within their boundaries - and only a small number of block groups run across the borders. The - block group level data provide valuable spatial information of a - city. This is particularly helpful for ACS 5-year surveys which - cover data down to the level of block groups. + block group as a block group may not exclusively belong to a city. - It provides longitude and latitude of the internal point of a geographic area for easy and quick mapping. You do not always need shape files to make nice maps, as in the map shown above. @@ -124,10 +115,10 @@ There are additional benefits of using this package: entities](https://gl-li.netlify.com/2017/12/28/use-totalcensus-package-to-determine-relationship-between-geographic-entities/); an application example. -Basic application ------------------ +How to use the package +---------------------- -### the `read_xxxx()` functions +### `read_xxxx()` functions The package has three functions to read decennial census, ACS 5-year survey, and ACS 1-year survey: `read_decennial()`, `read_acs5year()`, @@ -169,7 +160,7 @@ argument: incomplete. To use decennial census 2010 data to fill the missing values, set the argument to “dec2010”. -### the `search_xxxx()` functions +### `search_xxxx()` functions There are a family of `search_xxx()` functions to help find table contents, geoheaders, summary levels, geocomponents, FIPS codes and CBSA @@ -178,13 +169,16 @@ codes. The following examples demonstrate how to use these `read_xxx()` and `search_xxx()` functions. +Examples +-------- + ### Median gross rent in cities with population over 65000 A property management company wants to know the most recent rents in major cities in the US. How to get the data? We first need to determine which survey to read. For most recent survey -data, we want to read 2016 ACS 1-year estimates, which provide data for +data, we want to read 2018 ACS 1-year estimates, which provide data for geographic areas with population over 65000. We also need to determine which data files to read. We know summary @@ -193,7 +187,7 @@ level of cities is “160” or “place”. Browsing with in state files of ACS 1-year estimates. So we will read all the state files. -Then we need to check if 2016 ACS 1-year estimate has the rent data. We +Then we need to check if 2018 ACS 1-year estimate has the rent data. We run `search_tablecontents("acs1")` to open the dataset with `View()` in RStudio. You can provide keywords to search in the function but it is better to do the search in RStudio with filters. There are so many @@ -207,12 +201,12 @@ all geographic areas matches the conditions. Below is the code that gives what we want. The first time you use `read_xxxx()` functions to read data files, you will be asked to download data generated from decennial census 2010 and summary files -required for this function call, in this case, 2016 ACS 1-year summary +required for this function call, in this case, 2018 ACS 1-year summary files. Choose 1 to continue. ``` r rent <- read_acs1year( - year = 2016, + year = 2018, states = states_DC, table_contents = "rent = B25064_001", summary_level = "place" diff --git a/why_this_package/why_this_package.Rmd b/why_this_package/why_this_package.Rmd index 5e88fed..0ce6025 100644 --- a/why_this_package/why_this_package.Rmd +++ b/why_this_package/why_this_package.Rmd @@ -126,14 +126,13 @@ ggsave(filename = "why_this_package/prov_urban_rural_population.png") The data is in 2011-2015 ACS 5-year survey. ```{r} -prov_home <- read_acs5year(states = c("MA", "RI"), - year = 2015, +prov_home <- read_acs5year(year = 2018, + states = c("MA", "RI"), geo_headers = "CBSA", - table_contents = c("B01003_001", "B25077_001"), - summary_level = "block_group", - with_margin = FALSE) %>% + table_contents = "value = B25077_001", + summary_level = "block group", + dec_fill = "dec2010") %>% .[CBSA == "39300"] %>% - setnames(c("B01003_001_e", "B25077_001_e"), c("population", "value")) %>% # some missing value in home value shown as "." and so the whole column was # read into character. change column back to numeric and remove NAs .[, value := as.numeric(value)] %>% @@ -151,7 +150,7 @@ ggmap(prov_map9) + scale_size_area(max_size = 1) + scale_color_continuous(low = "green", high = "red", breaks = c(50000, 200000, 600000, 1000000), - labels = scales::unit_format("K", 1e-3)) + + labels = scales::unit_format(unit = "K", scale = 1e-3)) + guides(size = FALSE) + labs(color = "value ($)", caption = "Source: ACS 5-year survey 2011-2015", @@ -165,3 +164,46 @@ ggsave(file = "why_this_package/prov_home_values.png") ``` ![home value](prov_home_values.png) + +## median house value of each block group in continent USA +```{r} +library(totalcensus) +library(data.table) +library(ggmap) +us_map <- prov_map9 <- get_map("US", zoom = 4, color = "bw") + +home_national <- read_acs5year( + year = 2018, + states = states_DC, # all 50 states plus DC + table_contents = "home_value = B25077_001", + summary_level = "block group" +) %>% + .[, value := as.numeric(home_value)] %>% + .[!is.na(home_value)] %>% + .[order(home_value)] + +ggmap(us_map) + + geom_point(data = home_national, + aes(lon, lat, size = population, color = home_value), + alpha = 1) + + ylim(25, 49) + + scale_size_area(max_size = 1) + + scale_color_continuous( + low = "green", + high = "red", + breaks = c(100000, 500000, 1000000, 1500000, 2000000), + labels = scales::unit_format(unit = "K", scale = 1e-3) + ) + + guides(size = FALSE) + + labs(color = "value ($)", + caption = "Source: ACS 5-year survey 2014-2018", + title = "Median Home Values in each block group") + + theme(axis.title = element_blank(), + axis.text = element_blank(), + axis.ticks = element_blank(), + legend.position = c(0.95, 0.1), + legend.key = element_blank(), + legend.margin = margin(0, 0, 0, 0), + legend.background = element_blank()) +``` +