-
Notifications
You must be signed in to change notification settings - Fork 14
Book outline: R, Docker, Postgres for SQL instruction
John David Smith edited this page Oct 16, 2018
·
12 revisions
FILES: 01-09
- intro, goals, requirements, etc.
- How to use this book
- Docker in a Windows environment
- Docker, Postgres and R
- Documenting our OS/Docker applications (who talks to who?)
- Intro diagram
- Pose a big question (a goal for the whole investigation)
FILES: 11-19
- tidy data is 3rd normal form: basic SQL & data frames
- simple retrieval (one table)
- Limit rows: filter / WHERE
- Limit columns: select / SELECT
- Mapping data types from SQL to R
- Column name mapping in SQL and R
- Calculating new variables: mutate and ?
- Overview of dplyr -> SQL command mapping
FILES: 21-29
- Data catalog
- What’s in a database
- Expanded diagram
- How to query the metadata
- Finding traps
- name reuse
- ER diagram
- Watch for dates
- What’s in a database
- Pause for hygiene issues
- passwords — how to store them (not in plain text)
- how to avoid abusing shared resources
- table indexes
FILES: 31-39
- Joins
- kinds: left, right, anti, etc.
- order of joins
- explain (belongs with table indexes or here?)
- Dividing the work between R and the DBMS
- What should stay on the server / DBMS side
- What goes on the R side
- Criteria:
- Legibility
- code for humans versus code for computers
- one-time use for exploration versus repeated / scheduled use
- Taking advantage of PostgreSQL functions
- Taking advantage of the R function library
- Beyond dplyr
- lubridate
- stringr
FILES: 51-59 A worked example
- Loading a new public dataset
- Dealing with real issues
FILES: 71-79
- Advanced topics
- explain queries
- instrumentation: adding timings to your code
- DB privileges (important if you want to create staging tables in the DB)
- Writing to a database
- scope
- Staging intermediate datasets
- ?