update README

GL-Li · Jul 3, 2020 · a28c46e · a28c46e
1 parent e462959
commit a28c46e
Show file tree

Hide file tree

Showing 3 changed files with 94 additions and 59 deletions.
diff --git a/README.Rmd b/README.Rmd
@@ -21,11 +21,11 @@ knitr::opts_chunk$set(
 
 # Extract Decennial Census and American Community Survey Data
 
-Download summary files from [Census Bureau](https://www2.census.gov/) and extract data of decennial censuses and American Community Surveys from your local computer.
+Download summary files from [Census Bureau](https://www2.census.gov/) and extract data from the summary files.
 
 
 ## Update
-**1/8/2020**: Version 0.6.3 is on CRAN. The 2018 ACS 5 year data was added to the package. The package now includes:
+**1/8/2020**: Version 0.6.3 is on CRAN. The 2018 ACS 5 year data was added to the package. The package now includes all latest data since 2000:
 
 - Decennial census 2000 and 2010
 - ACS 1 year: 2005 - 2018
@@ -54,20 +54,18 @@ set_path_to_census("xxxxx/my_census_data")
 
 
 
-## Why another R census package
+## Introduction
 
-The [census API](https://www.census.gov/data/developers/guidance/api-user-guide.Available_Data.html) offers most data in decennial censuses and ACS estimates for download and API-based packages such as `tidycensus`, `censusapi` and `acs` make the downloading very convenient in R. So why we need another package?
+This package extract data directly from summary files of Decennial Censuses and American Community Surveys (ACS). The summary files store the summary data compiled directly from the original survey questionnaires filled out by each household. They are the most comprehensive datasets available to the public. By directly accessing the summary files, we are able to extract any data offered by Decennial Census and ACS.
 
-One advantage is that once you downloaded the summary files, you do not need internet anymore and everything is on your own computer. You do not need to worry about internet interruption or government shutdown. You have total control of the data.
+By downloading summary file to your computer, it is particularly fast and convenient to extract high resolution data at census tract, block group, and block level for a large area.
 
-Another benefit of using package `totalcensus` is that it makes census data extraction more flexible. It is particularly convenient to extract high resolution data at census tract, block group, and block level for a large area.
-
-Here is an example of how we extract the median home values in **all** block groups in the United States from 2011-2015 ACS 5-year survey with this package. You simply need to call the function `read_acs5year()`. It takes 15 seconds for my 4-years old laptop to return the data of all 217,739 block groups. In addition to the table contents we request, we also get the population and coordinate of each block group.
+Here is an example of how we extract the median home values in **all** block groups in the United States from 2011-2015 ACS 5-year survey with this package. You simply need to call the function `read_acs5year()`. It takes 15 seconds for my 7-years old laptop to return the data of all 217,739 block groups. In addition to the table contents we request, we also get the population and coordinate of each block group.
 
 ```{r eval = FALSE}
 library(totalcensus)
 home_national <- read_acs5year(
-    year = 2015,
+    year = 2018,
     states = states_DC,   # all 50 states plus DC
     table_contents = "home_value = B25077_001",
     summary_level = "block group"
@@ -81,7 +79,7 @@ With the coordinates, we can visualize the data on US map with `ggplot2` and `gg
 There are additional benefits of using this package:
 
 - You can get detailed urban/rural data from Census 2010. This package use summary file 1 with urban/rural update, while the census API only provide data in summary file 1 before urban/rural update. 
-- You can get all block groups that belong or partially belong to a city. Original census data do not provide city information for a block group as a block group may not uniquely belong to a city. However, large cities have most block groups within their boundaries and only a small number of block groups run across the borders. The block group level data provide valuable spatial information of a city. This is particularly helpful for ACS 5-year surveys which cover data down to the level of block groups. 
+- You can get all block groups that belong or partially belong to a city. Original census data do not provide city information for a block group as a block group may not exclusively belong to a city.
 - It provides longitude and latitude of the internal point of a geographic area for easy and quick mapping. You do not always need shape files to make nice maps, as in the map shown above. 
 
 
@@ -93,8 +91,8 @@ There are additional benefits of using this package:
 
 
 
-## Basic application
-### the `read_xxxx()` functions
+## How to use the package
+### `read_xxxx()` functions
 The package has three functions to read decennial census, ACS 5-year survey, and ACS 1-year survey: `read_decennial()`, `read_acs5year()`, and `read_acs1year()`. They are similar but as these datasets are so different, we prefer to keep three separate functions, one for each. 
 
 The function arguments serve as filters to select the data you want:
@@ -112,29 +110,30 @@ Functions `read_acs1year()` and `read_acs5year()` have additional argument:
 - with_margin: whether to read margin of error of the estimate.
 - dec_fill: whether to fill geo_headers codes with data from decennial census. The codes in ACS summary file are often incomplete. To use decennial census 2010 data to fill the missing values, set the argument to "dec2010".
 
-### the `search_xxxx()` functions
+### `search_xxxx()` functions
 
 There are a family of `search_xxx()` functions to help find table contents, geoheaders, summary levels, geocomponents, FIPS codes and CBSA codes. 
 
 The following examples demonstrate how to use these `read_xxx()` and `search_xxx()` functions.
 
 
+## Examples
 ### Median gross rent in cities with population over 65000
 A property management company wants to know the most recent rents in major cities in the US. How to get the data?
 
-We first need to determine which survey to read. For most recent survey data, we want to read 2016 ACS 1-year estimates, which provide data for geographic areas with population over 65000.
+We first need to determine which survey to read. For most recent survey data, we want to read 2018 ACS 1-year estimates, which provide data for geographic areas with population over 65000.
 
 We also need to determine which data files to read. We know summary level of cities is "160" or "place". Browsing with `search_summarylevels("acs1")`, we see that this summary level is only in state files of ACS 1-year estimates. So we will read all the state files.
 
-Then we need to check if 2016 ACS 1-year estimate has the rent data. We run `search_tablecontents("acs1")` to open the dataset with `View()` in RStudio. You can provide keywords to search in the function but it is better to do the search in RStudio with filters. There are so many tables that contains string "rent". It takes some time to find the right one if you are not familiar with ACS tables. After some struggle, we think  B25064_001 is what we want.
+Then we need to check if 2018 ACS 1-year estimate has the rent data. We run `search_tablecontents("acs1")` to open the dataset with `View()` in RStudio. You can provide keywords to search in the function but it is better to do the search in RStudio with filters. There are so many tables that contains string "rent". It takes some time to find the right one if you are not familiar with ACS tables. After some struggle, we think  B25064_001 is what we want.
 
 We do not need to specify `areas` and `geo_headers` as we are extracting all geographic areas matches the conditions.
 
-Below is the code that gives what we want. The first time you use `read_xxxx()` functions to read data files, you will be asked to download data generated from decennial census 2010 and summary files required for this function call, in this case, 2016 ACS 1-year summary files. Choose 1 to continue. 
+Below is the code that gives what we want. The first time you use `read_xxxx()` functions to read data files, you will be asked to download data generated from decennial census 2010 and summary files required for this function call, in this case, 2018 ACS 1-year summary files. Choose 1 to continue. 
 
 ```{r eval = FALSE}
 rent <- read_acs1year(
-    year = 2016,
+    year = 2018,
     states = states_DC,
     table_contents = "rent = B25064_001",
     summary_level = "place"

diff --git a/README.md b/README.md
@@ -9,14 +9,14 @@ Extract Decennial Census and American Community Survey Data
 ===========================================================
 
 Download summary files from [Census Bureau](https://www2.census.gov/)
-and extract data of decennial censuses and American Community Surveys
-from your local computer.
+and extract data from the summary files.
 
 Update
 ------
 
 **1/8/2020**: Version 0.6.3 is on CRAN. The 2018 ACS 5 year data was
-added to the package. The package now includes:
+added to the package. The package now includes all latest data since
+2000:
 
 -   Decennial census 2000 and 2010
 -   ACS 1 year: 2005 - 2018
@@ -48,36 +48,32 @@ library(totalcensus)
 set_path_to_census("xxxxx/my_census_data")
 ```
 
-Why another R census package
-----------------------------
+Introduction
+------------
 
-The [census
-API](https://www.census.gov/data/developers/guidance/api-user-guide.Available_Data.html)
-offers most data in decennial censuses and ACS estimates for download
-and API-based packages such as `tidycensus`, `censusapi` and `acs` make
-the downloading very convenient in R. So why we need another package?
+This package extract data directly from summary files of Decennial
+Censuses and American Community Surveys (ACS). The summary files store
+the summary data compiled directly from the original survey
+questionnaires filled out by each household. They are the most
+comprehensive datasets available to the public. By directly accessing
+the summary files, we are able to extract any data offered by Decennial
+Census and ACS.
 
-One advantage is that once you downloaded the summary files, you do not
-need internet anymore and everything is on your own computer. You do not
-need to worry about internet interruption or government shutdown. You
-have total control of the data.
-
-Another benefit of using package `totalcensus` is that it makes census
-data extraction more flexible. It is particularly convenient to extract
-high resolution data at census tract, block group, and block level for a
-large area.
+By downloading summary file to your computer, it is particularly fast
+and convenient to extract high resolution data at census tract, block
+group, and block level for a large area.
 
 Here is an example of how we extract the median home values in **all**
 block groups in the United States from 2011-2015 ACS 5-year survey with
 this package. You simply need to call the function `read_acs5year()`. It
-takes 15 seconds for my 4-years old laptop to return the data of all
+takes 15 seconds for my 7-years old laptop to return the data of all
 217,739 block groups. In addition to the table contents we request, we
 also get the population and coordinate of each block group.
 
 ``` r
 library(totalcensus)
 home_national <- read_acs5year(
-    year = 2015,
+    year = 2018,
     states = states_DC,   # all 50 states plus DC
     table_contents = "home_value = B25077_001",
     summary_level = "block group"
@@ -99,12 +95,7 @@ There are additional benefits of using this package:
     only provide data in summary file 1 before urban/rural update.
 -   You can get all block groups that belong or partially belong to a
     city. Original census data do not provide city information for a
-    block group as a block group may not uniquely belong to a city.
-    However, large cities have most block groups within their boundaries
-    and only a small number of block groups run across the borders. The
-    block group level data provide valuable spatial information of a
-    city. This is particularly helpful for ACS 5-year surveys which
-    cover data down to the level of block groups.
+    block group as a block group may not exclusively belong to a city.
 -   It provides longitude and latitude of the internal point of a
     geographic area for easy and quick mapping. You do not always need
     shape files to make nice maps, as in the map shown above.
@@ -124,10 +115,10 @@ There are additional benefits of using this package:
     entities](https://gl-li.netlify.com/2017/12/28/use-totalcensus-package-to-determine-relationship-between-geographic-entities/);
     an application example.
 
-Basic application
------------------
+How to use the package
+----------------------
 
-### the `read_xxxx()` functions
+### `read_xxxx()` functions
 
 The package has three functions to read decennial census, ACS 5-year
 survey, and ACS 1-year survey: `read_decennial()`, `read_acs5year()`,
@@ -169,7 +160,7 @@ argument:
     incomplete. To use decennial census 2010 data to fill the missing
     values, set the argument to “dec2010”.
 
-### the `search_xxxx()` functions
+### `search_xxxx()` functions
 
 There are a family of `search_xxx()` functions to help find table
 contents, geoheaders, summary levels, geocomponents, FIPS codes and CBSA
@@ -178,13 +169,16 @@ codes.
 The following examples demonstrate how to use these `read_xxx()` and
 `search_xxx()` functions.
 
+Examples
+--------
+
 ### Median gross rent in cities with population over 65000
 
 A property management company wants to know the most recent rents in
 major cities in the US. How to get the data?
 
 We first need to determine which survey to read. For most recent survey
-data, we want to read 2016 ACS 1-year estimates, which provide data for
+data, we want to read 2018 ACS 1-year estimates, which provide data for
 geographic areas with population over 65000.
 
 We also need to determine which data files to read. We know summary
@@ -193,7 +187,7 @@ level of cities is “160” or “place”. Browsing with
 in state files of ACS 1-year estimates. So we will read all the state
 files.
 
-Then we need to check if 2016 ACS 1-year estimate has the rent data. We
+Then we need to check if 2018 ACS 1-year estimate has the rent data. We
 run `search_tablecontents("acs1")` to open the dataset with `View()` in
 RStudio. You can provide keywords to search in the function but it is
 better to do the search in RStudio with filters. There are so many
@@ -207,12 +201,12 @@ all geographic areas matches the conditions.
 Below is the code that gives what we want. The first time you use
 `read_xxxx()` functions to read data files, you will be asked to
 download data generated from decennial census 2010 and summary files
-required for this function call, in this case, 2016 ACS 1-year summary
+required for this function call, in this case, 2018 ACS 1-year summary
 files. Choose 1 to continue.
 
 ``` r
 rent <- read_acs1year(
-    year = 2016,
+    year = 2018,
     states = states_DC,
     table_contents = "rent = B25064_001",
     summary_level = "place"

diff --git a/why_this_package/why_this_package.Rmd b/why_this_package/why_this_package.Rmd
@@ -126,14 +126,13 @@ ggsave(filename = "why_this_package/prov_urban_rural_population.png")
 The data is in 2011-2015 ACS 5-year survey.
 
 ```{r}
-prov_home <- read_acs5year(states = c("MA", "RI"),
-                           year = 2015,
+prov_home <- read_acs5year(year = 2018,
+                           states = c("MA", "RI"),
                            geo_headers = "CBSA",
-                           table_contents = c("B01003_001", "B25077_001"),
-                           summary_level = "block_group",
-                           with_margin = FALSE) %>%
+                           table_contents = "value = B25077_001",
+                           summary_level = "block group",
+                           dec_fill = "dec2010") %>%
     .[CBSA == "39300"] %>%
-    setnames(c("B01003_001_e", "B25077_001_e"), c("population", "value")) %>%
     # some missing value in home value shown as "." and so the whole column was
     # read into character. change column back to numeric and remove NAs
     .[, value := as.numeric(value)] %>%
@@ -151,7 +150,7 @@ ggmap(prov_map9) +
     scale_size_area(max_size = 1) +
     scale_color_continuous(low = "green", high = "red",
                            breaks = c(50000, 200000, 600000, 1000000),
-                           labels = scales::unit_format("K", 1e-3)) +
+                           labels = scales::unit_format(unit = "K", scale = 1e-3)) +
     guides(size = FALSE) +
     labs(color = "value ($)",
          caption = "Source: ACS 5-year survey 2011-2015",
@@ -165,3 +164,46 @@ ggsave(file = "why_this_package/prov_home_values.png")
 ```
 
 ![home value](prov_home_values.png)
+
+## median house value of each block group in continent USA
+```{r}
+library(totalcensus)
+library(data.table)
+library(ggmap)
+us_map <- prov_map9 <- get_map("US", zoom = 4, color = "bw")
+
+home_national <- read_acs5year(
+    year = 2018,
+    states = states_DC,   # all 50 states plus DC
+    table_contents = "home_value = B25077_001",
+    summary_level = "block group"
+) %>%
+    .[, value := as.numeric(home_value)] %>%
+    .[!is.na(home_value)] %>%
+    .[order(home_value)]
+
+ggmap(us_map) +
+    geom_point(data = home_national,
+               aes(lon, lat, size = population, color = home_value),
+               alpha = 1) +
+    ylim(25, 49) +
+    scale_size_area(max_size = 1) +
+    scale_color_continuous(
+        low = "green", 
+        high = "red",
+        breaks = c(100000, 500000, 1000000, 1500000, 2000000),
+        labels = scales::unit_format(unit = "K", scale = 1e-3)
+    ) +
+    guides(size = FALSE) +
+    labs(color = "value ($)",
+         caption = "Source: ACS 5-year survey 2014-2018",
+         title = "Median Home Values in each block group") +
+    theme(axis.title = element_blank(),
+          axis.text = element_blank(),
+          axis.ticks = element_blank(),
+          legend.position = c(0.95, 0.1),
+          legend.key = element_blank(),
+          legend.margin = margin(0, 0, 0, 0),
+          legend.background = element_blank())
+```
+