Skip to content

Book outline: R, Docker, Postgres for SQL instruction

John David Smith edited this page Oct 6, 2018 · 12 revisions

Here’s a first draft:

FILES: 01-09

  • intro, goals, requirements, etc.
  • How to use this book
  • Docker, Postgres and R
  • Documenting our OS/Docker applications (who talks to who?)
    • Intro diagram
  • Pose a big question (a goal for the whole investigation)

FILES: 11-19

  • simple retrieval (one table)
    • Limit rows: filter / WHERE
    • Limit columns: select / SELECT
    • Mapping data types from SQL to R
    • Column name mapping in SQL and R
    • Calculating new variables: mutate and ?
  • Overview of dplyr -> SQL command mapping

FILES: 21-29

  • Data catalog
    • What’s in a database
      • Expanded diagram
      • How to query the metadata
    • Finding traps
      • name reuse
      • ER diagram
  • Pause for hygiene issues
    • passwords — how to store them (not in plain text)
    • how to avoid abusing shared resources
    • table indexes

FILES: 31-39

  • Joins
    • kinds: left, right, anti, etc.
    • order of joins
    • explain (belongs with table indexes or here?)
  • Dividing the work between R and the DBMS
    • What should stay on the server / DBMS side
    • What goes on the R side
    • Criteria:
      • Legibility
      • code for humans versus code for computers
      • one-time use for exploration versus repeated / scheduled use
  • Taking advantage of PostgreSQL functions
  • Taking advantage of the R function library
    • Beyond dplyr
    • lubridate
    • stringr

FILES: 51-59 A worked example

  • Loading a new public dataset
  • Dealing with real issues

FILES: 71-79

  • Advanced topics
    • explain queries
    • instrumentation: adding timings to your code
    • DB privileges (important if you want to create staging tables in the DB)
  • Writing to a database
    • scope
    • ?