Skip to content

Latest commit

 

History

History
 
 

loadExample

Data from 2011 ACS 1-year PUMS fields American Community Survey

This is an example how to load the Census data used in the book. We have the data pre-prepared here so you do not need to perform any of these steps unless you want to learn about SQLscrewdriver (not a requirement for later exercises in the book).

Original data is not included here for reasons of license and size.

To download data:

Leave data in these directories and one directory above csv_hus and csv_pus run the following at the command line:

java -classpath SQLScrewdriver.jar:h2-1.3.170.jar \
   com.winvector.db.LoadFiles \
   file:dbDef.xml , hus \
   file:csv_hus/ss11husa.csv file:csv_hus/ss11husb.csv
java -classpath SQLScrewdriver.jar:h2-1.3.170.jar \
   com.winvector.db.LoadFiles \
   file:dbDef.xml , pus \
   file:csv_pus/ss11pusa.csv file:csv_pus/ss11pusb.csv

The jars SQLScrewdriver.jar h2-1.3.170.jar are available here. Note: Windows users should use ";" to sparate jars instead of ":".

Where the XML file dbDef.xml is:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
 <comment>testdb</comment>
 <entry key="user">u</entry>
 <entry key="password">u</entry>
 <entry key="driver">org.h2.Driver</entry>
 <entry key="url">jdbc:h2:H2DB \
  ;LOG=0;CACHE_SIZE=65536;LOCK_MODE=0;UNDO_LOG=0</entry>
</properties>

And the follow-up R commands are:

options( java.parameters = "-Xmx2g" )
library(RJDBC)
drv = JDBC("org.h2.Driver","h2-1.3.170.jar",identifier.quote="'")
options=";LOG=0;CACHE_SIZE=65536;LOCK_MODE=0;UNDO_LOG=0"
conn = dbConnect(drv,paste("jdbc:h2:H2DB",options,sep=''),"u","u")
dhus = dbGetQuery(conn,"SELECT * FROM hus WHERE ORIGRANDGROUP<=1")
dpus = dbGetQuery(conn,"SELECT pus.* FROM pus WHERE pus.SERIALNO IN \
(SELECT DISTINCT hus.SERIALNO FROM hus WHERE hus.ORIGRANDGROUP<=1)")
dbDisconnect(conn)
save(dhus,dpus,file='phsample.RData')

This produces phsample.RData (a already prepared copy of which is available here, so you can skip all of the above steps). Data can then be loaded with:

load('phsample.RData')

This loads dhus (2982 random households) and dpus (6279 personal records joined with the households from dhus).

Note on downloading files

A remote source could update files at any time (and even worse they may change links). The data we used in this book had the following cryptographic hashses (downloaded in March and February and 2013, and also matching on a June 4 2014 download):

  • csv_hus.zip shasum: cdfdfb326956e202fdb560ee34471339ac8abd6c
  • csv_pus.zip shasum: aa0f4add21e327b96d9898b850e618aeca10f6d0
  • csv_hus/ss11husa.csv shasum: 72a0212e8313af83354460af1daeb54a7321f429
  • csv_hus/ss11husb.csv shasum: 1edaedb79f7766a9712fc0a8a7b674e427e5ba56
  • csv_pus/ss11pusa.csv shasum: 574d657d2094087722a054e3c16c277ece34f01c
  • csv_pus/ss11pusb.csv shasum: 89a6401524e2f25e02633849828462fa1204ea91