Meeting Notes 2018 10 06

October 6 Team meeting

Location: Study room 2B at Central Library, Downtown Portland -- 801 SW 10th

Next meeting: Saturday October 14, at 10 am

Topics

Presenting the project to R-Ladies PDX !! (tuesday Nov. 6)
- We can have as much time as we want
- Need to get a venue (e.g., CLSB? See note to Sophie below)
- Distribute something a week before the meeting (Summary, Outline, link to the repo to get ahead start)
- Presenters are: Sophie, Maryanne, and Dipti
- Purpose: “We are looking for feedback & suggestions”
Review of / walk-through Book Outline
- Purpose: Simulate use of R and SQL in a corporate environment on your laptop.
- Audience: R users who have had some exposure to dplyr and SQL.
- Pattern of presentation in each Chapter: Since users are stronger in R:
  - Explain the motivation.
  - Start with R example.
  - Show how to do that in SQL.
- Chapter: explain which things are easier to do in R or SQL; comparison of strengths and weaknesses of each. For example:
  - Describe the limits of SQL output formatting and show how dplyr is a good tool for first drafts
  - Describe how R can fail operating on large table. Give that as motivation for doing that operation in SQL instead.
- Docker motivation
- Useful book structure?
  - Hadley’s data science book http://r-pkgs.had.co.nz/
  - Deep Dive
  - SQL tutorials
- Scope
- Contents
- Sequence
- It looks like dplyr / SQL issues crop up various different places, how consolidate?
Should chapters 5 and 6 be combined? Decision: basically, yes.
- Chapter 5 is simpler
- Chapter 6 could be an appendix or code snippet
  - Check on system2('docker', 'exec sql-pet ls petdir | grep "dvdrental.tar" ', stdout = TRUE, stderr = TRUE)
Should the two projects be merged? Decision: basically, yes.
- sql-pet
- r-database-docker
https://github.com/smithjd/sql-pet/wiki/Style-guide -- added stuff
Do we want / need a package?
- Logic for doing it now:
  - Have functions available
  - Have tests for the functions
  - Known structure
  - Hadley Wickham’s book on R Packages: http://r-pkgs.had.co.nz/
- Sample functions or functionality
  - Execute a “chapter head” function at the top of each chapter with all the Library statements
  - Install dependencies
  - Check for system requirements
  - Text color ideas?
  - Wait_for_postgres
  - Various functions to fire up docker/postgres
- Walk through Chapter 10 (21)
  - Exlain_query -- submit
  - Advice: “leverage your local data experts to help you not get lost.”
  - Where do you get your data? “From the database” is the wrong answer.
  - pgModeler - https://pgmodeler.io/
  - “public” schema is your starting point for data. Your dba may point you to some other schema
  - The difference between base table and view: SQL is no different, but for analysis views hide complexity and create redundancy.
  - Cut 10.4.1 - difference between view and base table
  - Table dimensions: 10.4.2 Counting columns and name reuse. Row counts are essential.
  - Lintr package: https://cran.r-project.org/web/packages/lintr/index.html
- Name
  - sqlpetr - package
    - MIT license
  - sql-pet -- book repository
Todo
- Mary Anne: small tasks that can be written in 30 min
  - tidy data is 3rd normal form: basic SQL & data frames
- Ian
  - Simple queries - 1 table - using “rentals” table
- Sophie
  - SQL / dplyr comparison
  - ERD diagram
  - Check on a venue: Jessica Minnier or Ted Laderas for CLSB
- Znmeb
  - Collect object RAM usage tools
  - Check out pgModeler on DVD rental database
  - Create package repository (actually, if I do it it will be in my account unless we create a GitHub organization. So John should do this.)
- JDS
  - Revise chapter 10
  - Reorganize repo (book, etc)
  - Use Ed’s notes to Ian as a start to “how to use this book” page
- DM
  - Format Introduction chapter similar to Hadley’s book structure.
  - Typo correction
  - Create Appendix to
    - Introduction to SQL for people with beginners.
    - ANSI standards: Oracle, MySQL, DB2, Microsoft equivalents

Home

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meeting Notes 2018 10 06

Clone this wiki locally