Skip to content

Meeting Notes 2018 10 06

John David Smith edited this page Oct 6, 2018 · 1 revision

October 6 Team meeting

Location: Study room 2B at Central Library, Downtown Portland -- 801 SW 10th

Next meeting: Saturday October 14, at 10 am

Topics

  • Presenting the project to R-Ladies PDX !! (tuesday Nov. 6)

    • We can have as much time as we want
    • Need to get a venue (e.g., CLSB? See note to Sophie below)
    • Distribute something a week before the meeting (Summary, Outline, link to the repo to get ahead start)
    • Presenters are: Sophie, Maryanne, and Dipti
    • Purpose: “We are looking for feedback & suggestions”
  • Review of / walk-through Book Outline

    • Purpose: Simulate use of R and SQL in a corporate environment on your laptop.

    • Audience: R users who have had some exposure to dplyr and SQL.

    • Pattern of presentation in each Chapter: Since users are stronger in R:

      • Explain the motivation.
      • Start with R example.
      • Show how to do that in SQL.
    • Chapter: explain which things are easier to do in R or SQL; comparison of strengths and weaknesses of each. For example:

      • Describe the limits of SQL output formatting and show how dplyr is a good tool for first drafts
      • Describe how R can fail operating on large table. Give that as motivation for doing that operation in SQL instead.
    • Docker motivation

    • Useful book structure?

    • Scope

    • Contents

    • Sequence

    • It looks like dplyr / SQL issues crop up various different places, how consolidate?

  • Should chapters 5 and 6 be combined? Decision: basically, yes.

    • Chapter 5 is simpler

    • Chapter 6 could be an appendix or code snippet

      • Check on system2('docker', 'exec sql-pet ls petdir | grep "dvdrental.tar" ', ​ stdout = TRUE, stderr = TRUE)
  • Should the two projects be merged? Decision: basically, yes.

    • sql-pet
    • r-database-docker
  • https://github.com/smithjd/sql-pet/wiki/Style-guide -- added stuff

  • Do we want / need a package?

    • Logic for doing it now:

      • Have functions available
      • Have tests for the functions
      • Known structure
      • Hadley Wickham’s book on R Packages: http://r-pkgs.had.co.nz/
    • Sample functions or functionality

      • Execute a “chapter head” function at the top of each chapter with all the Library statements
      • Install dependencies
      • Check for system requirements
      • Text color ideas?
      • Wait_for_postgres
      • Various functions to fire up docker/postgres
    • Walk through Chapter 10 (21)

      • Exlain_query -- submit
      • Advice: “leverage your local data experts to help you not get lost.”
      • Where do you get your data? “From the database” is the wrong answer.
      • pgModeler - https://pgmodeler.io/
      • “public” schema is your starting point for data. Your dba may point you to some other schema
      • The difference between base table and view: SQL is no different, but for analysis views hide complexity and create redundancy.
      • Cut 10.4.1 - difference between view and base table
      • Table dimensions: 10.4.2 Counting columns and name reuse. Row counts are essential.
      • Lintr package: https://cran.r-project.org/web/packages/lintr/index.html
    • Name

      • sqlpetr - package

        • MIT license
      • sql-pet -- book repository

  • Todo

    • Mary Anne: small tasks that can be written in 30 min

      • tidy data is 3rd normal form: basic SQL & data frames
    • Ian

      • Simple queries - 1 table - using “rentals” table
    • Sophie

      • SQL / dplyr comparison
      • ERD diagram
      • Check on a venue: Jessica Minnier or Ted Laderas for CLSB
    • Znmeb

      • Collect object RAM usage tools
      • Check out pgModeler on DVD rental database
      • Create package repository (actually, if I do it it will be in my account unless we create a GitHub organization. So John should do this.)
    • JDS

      • Revise chapter 10
      • Reorganize repo (book, etc)
      • Use Ed’s notes to Ian as a start to “how to use this book” page
    • DM

      • Format Introduction chapter similar to Hadley’s book structure.

      • Typo correction

      • Create Appendix to

        • Introduction to SQL for people with beginners.
        • ANSI standards: Oracle, MySQL, DB2, Microsoft equivalents