Skip to content

Commit

Permalink
update README
Browse files Browse the repository at this point in the history
  • Loading branch information
GL-Li committed Jul 3, 2020
1 parent e462959 commit a28c46e
Show file tree
Hide file tree
Showing 3 changed files with 94 additions and 59 deletions.
33 changes: 16 additions & 17 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,11 @@ knitr::opts_chunk$set(

# Extract Decennial Census and American Community Survey Data

Download summary files from [Census Bureau](https://www2.census.gov/) and extract data of decennial censuses and American Community Surveys from your local computer.
Download summary files from [Census Bureau](https://www2.census.gov/) and extract data from the summary files.


## Update
**1/8/2020**: Version 0.6.3 is on CRAN. The 2018 ACS 5 year data was added to the package. The package now includes:
**1/8/2020**: Version 0.6.3 is on CRAN. The 2018 ACS 5 year data was added to the package. The package now includes all latest data since 2000:

- Decennial census 2000 and 2010
- ACS 1 year: 2005 - 2018
Expand Down Expand Up @@ -54,20 +54,18 @@ set_path_to_census("xxxxx/my_census_data")



## Why another R census package
## Introduction

The [census API](https://www.census.gov/data/developers/guidance/api-user-guide.Available_Data.html) offers most data in decennial censuses and ACS estimates for download and API-based packages such as `tidycensus`, `censusapi` and `acs` make the downloading very convenient in R. So why we need another package?
This package extract data directly from summary files of Decennial Censuses and American Community Surveys (ACS). The summary files store the summary data compiled directly from the original survey questionnaires filled out by each household. They are the most comprehensive datasets available to the public. By directly accessing the summary files, we are able to extract any data offered by Decennial Census and ACS.

One advantage is that once you downloaded the summary files, you do not need internet anymore and everything is on your own computer. You do not need to worry about internet interruption or government shutdown. You have total control of the data.
By downloading summary file to your computer, it is particularly fast and convenient to extract high resolution data at census tract, block group, and block level for a large area.

Another benefit of using package `totalcensus` is that it makes census data extraction more flexible. It is particularly convenient to extract high resolution data at census tract, block group, and block level for a large area.

Here is an example of how we extract the median home values in **all** block groups in the United States from 2011-2015 ACS 5-year survey with this package. You simply need to call the function `read_acs5year()`. It takes 15 seconds for my 4-years old laptop to return the data of all 217,739 block groups. In addition to the table contents we request, we also get the population and coordinate of each block group.
Here is an example of how we extract the median home values in **all** block groups in the United States from 2011-2015 ACS 5-year survey with this package. You simply need to call the function `read_acs5year()`. It takes 15 seconds for my 7-years old laptop to return the data of all 217,739 block groups. In addition to the table contents we request, we also get the population and coordinate of each block group.

```{r eval = FALSE}
library(totalcensus)
home_national <- read_acs5year(
year = 2015,
year = 2018,
states = states_DC, # all 50 states plus DC
table_contents = "home_value = B25077_001",
summary_level = "block group"
Expand All @@ -81,7 +79,7 @@ With the coordinates, we can visualize the data on US map with `ggplot2` and `gg
There are additional benefits of using this package:

- You can get detailed urban/rural data from Census 2010. This package use summary file 1 with urban/rural update, while the census API only provide data in summary file 1 before urban/rural update.
- You can get all block groups that belong or partially belong to a city. Original census data do not provide city information for a block group as a block group may not uniquely belong to a city. However, large cities have most block groups within their boundaries and only a small number of block groups run across the borders. The block group level data provide valuable spatial information of a city. This is particularly helpful for ACS 5-year surveys which cover data down to the level of block groups.
- You can get all block groups that belong or partially belong to a city. Original census data do not provide city information for a block group as a block group may not exclusively belong to a city.
- It provides longitude and latitude of the internal point of a geographic area for easy and quick mapping. You do not always need shape files to make nice maps, as in the map shown above.


Expand All @@ -93,8 +91,8 @@ There are additional benefits of using this package:



## Basic application
### the `read_xxxx()` functions
## How to use the package
### `read_xxxx()` functions
The package has three functions to read decennial census, ACS 5-year survey, and ACS 1-year survey: `read_decennial()`, `read_acs5year()`, and `read_acs1year()`. They are similar but as these datasets are so different, we prefer to keep three separate functions, one for each.

The function arguments serve as filters to select the data you want:
Expand All @@ -112,29 +110,30 @@ Functions `read_acs1year()` and `read_acs5year()` have additional argument:
- with_margin: whether to read margin of error of the estimate.
- dec_fill: whether to fill geo_headers codes with data from decennial census. The codes in ACS summary file are often incomplete. To use decennial census 2010 data to fill the missing values, set the argument to "dec2010".

### the `search_xxxx()` functions
### `search_xxxx()` functions

There are a family of `search_xxx()` functions to help find table contents, geoheaders, summary levels, geocomponents, FIPS codes and CBSA codes.

The following examples demonstrate how to use these `read_xxx()` and `search_xxx()` functions.


## Examples
### Median gross rent in cities with population over 65000
A property management company wants to know the most recent rents in major cities in the US. How to get the data?

We first need to determine which survey to read. For most recent survey data, we want to read 2016 ACS 1-year estimates, which provide data for geographic areas with population over 65000.
We first need to determine which survey to read. For most recent survey data, we want to read 2018 ACS 1-year estimates, which provide data for geographic areas with population over 65000.

We also need to determine which data files to read. We know summary level of cities is "160" or "place". Browsing with `search_summarylevels("acs1")`, we see that this summary level is only in state files of ACS 1-year estimates. So we will read all the state files.

Then we need to check if 2016 ACS 1-year estimate has the rent data. We run `search_tablecontents("acs1")` to open the dataset with `View()` in RStudio. You can provide keywords to search in the function but it is better to do the search in RStudio with filters. There are so many tables that contains string "rent". It takes some time to find the right one if you are not familiar with ACS tables. After some struggle, we think B25064_001 is what we want.
Then we need to check if 2018 ACS 1-year estimate has the rent data. We run `search_tablecontents("acs1")` to open the dataset with `View()` in RStudio. You can provide keywords to search in the function but it is better to do the search in RStudio with filters. There are so many tables that contains string "rent". It takes some time to find the right one if you are not familiar with ACS tables. After some struggle, we think B25064_001 is what we want.

We do not need to specify `areas` and `geo_headers` as we are extracting all geographic areas matches the conditions.

Below is the code that gives what we want. The first time you use `read_xxxx()` functions to read data files, you will be asked to download data generated from decennial census 2010 and summary files required for this function call, in this case, 2016 ACS 1-year summary files. Choose 1 to continue.
Below is the code that gives what we want. The first time you use `read_xxxx()` functions to read data files, you will be asked to download data generated from decennial census 2010 and summary files required for this function call, in this case, 2018 ACS 1-year summary files. Choose 1 to continue.

```{r eval = FALSE}
rent <- read_acs1year(
year = 2016,
year = 2018,
states = states_DC,
table_contents = "rent = B25064_001",
summary_level = "place"
Expand Down
64 changes: 29 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,14 @@ Extract Decennial Census and American Community Survey Data
===========================================================

Download summary files from [Census Bureau](https://www2.census.gov/)
and extract data of decennial censuses and American Community Surveys
from your local computer.
and extract data from the summary files.

Update
------

**1/8/2020**: Version 0.6.3 is on CRAN. The 2018 ACS 5 year data was
added to the package. The package now includes:
added to the package. The package now includes all latest data since
2000:

- Decennial census 2000 and 2010
- ACS 1 year: 2005 - 2018
Expand Down Expand Up @@ -48,36 +48,32 @@ library(totalcensus)
set_path_to_census("xxxxx/my_census_data")
```

Why another R census package
----------------------------
Introduction
------------

The [census
API](https://www.census.gov/data/developers/guidance/api-user-guide.Available_Data.html)
offers most data in decennial censuses and ACS estimates for download
and API-based packages such as `tidycensus`, `censusapi` and `acs` make
the downloading very convenient in R. So why we need another package?
This package extract data directly from summary files of Decennial
Censuses and American Community Surveys (ACS). The summary files store
the summary data compiled directly from the original survey
questionnaires filled out by each household. They are the most
comprehensive datasets available to the public. By directly accessing
the summary files, we are able to extract any data offered by Decennial
Census and ACS.

One advantage is that once you downloaded the summary files, you do not
need internet anymore and everything is on your own computer. You do not
need to worry about internet interruption or government shutdown. You
have total control of the data.

Another benefit of using package `totalcensus` is that it makes census
data extraction more flexible. It is particularly convenient to extract
high resolution data at census tract, block group, and block level for a
large area.
By downloading summary file to your computer, it is particularly fast
and convenient to extract high resolution data at census tract, block
group, and block level for a large area.

Here is an example of how we extract the median home values in **all**
block groups in the United States from 2011-2015 ACS 5-year survey with
this package. You simply need to call the function `read_acs5year()`. It
takes 15 seconds for my 4-years old laptop to return the data of all
takes 15 seconds for my 7-years old laptop to return the data of all
217,739 block groups. In addition to the table contents we request, we
also get the population and coordinate of each block group.

``` r
library(totalcensus)
home_national <- read_acs5year(
year = 2015,
year = 2018,
states = states_DC, # all 50 states plus DC
table_contents = "home_value = B25077_001",
summary_level = "block group"
Expand All @@ -99,12 +95,7 @@ There are additional benefits of using this package:
only provide data in summary file 1 before urban/rural update.
- You can get all block groups that belong or partially belong to a
city. Original census data do not provide city information for a
block group as a block group may not uniquely belong to a city.
However, large cities have most block groups within their boundaries
and only a small number of block groups run across the borders. The
block group level data provide valuable spatial information of a
city. This is particularly helpful for ACS 5-year surveys which
cover data down to the level of block groups.
block group as a block group may not exclusively belong to a city.
- It provides longitude and latitude of the internal point of a
geographic area for easy and quick mapping. You do not always need
shape files to make nice maps, as in the map shown above.
Expand All @@ -124,10 +115,10 @@ There are additional benefits of using this package:
entities](https://gl-li.netlify.com/2017/12/28/use-totalcensus-package-to-determine-relationship-between-geographic-entities/);
an application example.

Basic application
-----------------
How to use the package
----------------------

### the `read_xxxx()` functions
### `read_xxxx()` functions

The package has three functions to read decennial census, ACS 5-year
survey, and ACS 1-year survey: `read_decennial()`, `read_acs5year()`,
Expand Down Expand Up @@ -169,7 +160,7 @@ argument:
incomplete. To use decennial census 2010 data to fill the missing
values, set the argument to “dec2010”.

### the `search_xxxx()` functions
### `search_xxxx()` functions

There are a family of `search_xxx()` functions to help find table
contents, geoheaders, summary levels, geocomponents, FIPS codes and CBSA
Expand All @@ -178,13 +169,16 @@ codes.
The following examples demonstrate how to use these `read_xxx()` and
`search_xxx()` functions.

Examples
--------

### Median gross rent in cities with population over 65000

A property management company wants to know the most recent rents in
major cities in the US. How to get the data?

We first need to determine which survey to read. For most recent survey
data, we want to read 2016 ACS 1-year estimates, which provide data for
data, we want to read 2018 ACS 1-year estimates, which provide data for
geographic areas with population over 65000.

We also need to determine which data files to read. We know summary
Expand All @@ -193,7 +187,7 @@ level of cities is “160” or “place”. Browsing with
in state files of ACS 1-year estimates. So we will read all the state
files.

Then we need to check if 2016 ACS 1-year estimate has the rent data. We
Then we need to check if 2018 ACS 1-year estimate has the rent data. We
run `search_tablecontents("acs1")` to open the dataset with `View()` in
RStudio. You can provide keywords to search in the function but it is
better to do the search in RStudio with filters. There are so many
Expand All @@ -207,12 +201,12 @@ all geographic areas matches the conditions.
Below is the code that gives what we want. The first time you use
`read_xxxx()` functions to read data files, you will be asked to
download data generated from decennial census 2010 and summary files
required for this function call, in this case, 2016 ACS 1-year summary
required for this function call, in this case, 2018 ACS 1-year summary
files. Choose 1 to continue.

``` r
rent <- read_acs1year(
year = 2016,
year = 2018,
states = states_DC,
table_contents = "rent = B25064_001",
summary_level = "place"
Expand Down
56 changes: 49 additions & 7 deletions why_this_package/why_this_package.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -126,14 +126,13 @@ ggsave(filename = "why_this_package/prov_urban_rural_population.png")
The data is in 2011-2015 ACS 5-year survey.

```{r}
prov_home <- read_acs5year(states = c("MA", "RI"),
year = 2015,
prov_home <- read_acs5year(year = 2018,
states = c("MA", "RI"),
geo_headers = "CBSA",
table_contents = c("B01003_001", "B25077_001"),
summary_level = "block_group",
with_margin = FALSE) %>%
table_contents = "value = B25077_001",
summary_level = "block group",
dec_fill = "dec2010") %>%
.[CBSA == "39300"] %>%
setnames(c("B01003_001_e", "B25077_001_e"), c("population", "value")) %>%
# some missing value in home value shown as "." and so the whole column was
# read into character. change column back to numeric and remove NAs
.[, value := as.numeric(value)] %>%
Expand All @@ -151,7 +150,7 @@ ggmap(prov_map9) +
scale_size_area(max_size = 1) +
scale_color_continuous(low = "green", high = "red",
breaks = c(50000, 200000, 600000, 1000000),
labels = scales::unit_format("K", 1e-3)) +
labels = scales::unit_format(unit = "K", scale = 1e-3)) +
guides(size = FALSE) +
labs(color = "value ($)",
caption = "Source: ACS 5-year survey 2011-2015",
Expand All @@ -165,3 +164,46 @@ ggsave(file = "why_this_package/prov_home_values.png")
```

![home value](prov_home_values.png)

## median house value of each block group in continent USA
```{r}
library(totalcensus)
library(data.table)
library(ggmap)
us_map <- prov_map9 <- get_map("US", zoom = 4, color = "bw")
home_national <- read_acs5year(
year = 2018,
states = states_DC, # all 50 states plus DC
table_contents = "home_value = B25077_001",
summary_level = "block group"
) %>%
.[, value := as.numeric(home_value)] %>%
.[!is.na(home_value)] %>%
.[order(home_value)]
ggmap(us_map) +
geom_point(data = home_national,
aes(lon, lat, size = population, color = home_value),
alpha = 1) +
ylim(25, 49) +
scale_size_area(max_size = 1) +
scale_color_continuous(
low = "green",
high = "red",
breaks = c(100000, 500000, 1000000, 1500000, 2000000),
labels = scales::unit_format(unit = "K", scale = 1e-3)
) +
guides(size = FALSE) +
labs(color = "value ($)",
caption = "Source: ACS 5-year survey 2014-2018",
title = "Median Home Values in each block group") +
theme(axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
legend.position = c(0.95, 0.1),
legend.key = element_blank(),
legend.margin = margin(0, 0, 0, 0),
legend.background = element_blank())
```

0 comments on commit a28c46e

Please sign in to comment.