-
Notifications
You must be signed in to change notification settings - Fork 14
Meeting Notes 2018 10 06
October 6 Team meeting
Location: Study room 2B at Central Library, Downtown Portland -- 801 SW 10th
Next meeting: Saturday October 14, at 10 am
Topics
-
Presenting the project to R-Ladies PDX !! (tuesday Nov. 6)
- We can have as much time as we want
- Need to get a venue (e.g., CLSB? See note to Sophie below)
- Distribute something a week before the meeting (Summary, Outline, link to the repo to get ahead start)
- Presenters are: Sophie, Maryanne, and Dipti
- Purpose: “We are looking for feedback & suggestions”
-
Review of / walk-through Book Outline
-
Purpose: Simulate use of R and SQL in a corporate environment on your laptop.
-
Audience: R users who have had some exposure to dplyr and SQL.
-
Pattern of presentation in each Chapter: Since users are stronger in R:
- Explain the motivation.
- Start with R example.
- Show how to do that in SQL.
-
Chapter: explain which things are easier to do in R or SQL; comparison of strengths and weaknesses of each. For example:
- Describe the limits of SQL output formatting and show how dplyr is a good tool for first drafts
- Describe how R can fail operating on large table. Give that as motivation for doing that operation in SQL instead.
-
Docker motivation
-
Useful book structure?
- Hadley’s data science book http://r-pkgs.had.co.nz/
- Deep Dive
- SQL tutorials
-
Scope
-
Contents
-
Sequence
-
It looks like dplyr / SQL issues crop up various different places, how consolidate?
-
-
Should chapters 5 and 6 be combined? Decision: basically, yes.
-
Chapter 5 is simpler
-
Chapter 6 could be an appendix or code snippet
- Check on system2('docker', 'exec sql-pet ls petdir | grep "dvdrental.tar" ', stdout = TRUE, stderr = TRUE)
-
-
Should the two projects be merged? Decision: basically, yes.
- sql-pet
- r-database-docker
-
https://github.com/smithjd/sql-pet/wiki/Style-guide -- added stuff
-
Do we want / need a package?
-
Logic for doing it now:
- Have functions available
- Have tests for the functions
- Known structure
- Hadley Wickham’s book on R Packages: http://r-pkgs.had.co.nz/
-
Sample functions or functionality
- Execute a “chapter head” function at the top of each chapter with all the Library statements
- Install dependencies
- Check for system requirements
- Text color ideas?
- Wait_for_postgres
- Various functions to fire up docker/postgres
-
Walk through Chapter 10 (21)
- Exlain_query -- submit
- Advice: “leverage your local data experts to help you not get lost.”
- Where do you get your data? “From the database” is the wrong answer.
- pgModeler - https://pgmodeler.io/
- “public” schema is your starting point for data. Your dba may point you to some other schema
- The difference between base table and view: SQL is no different, but for analysis views hide complexity and create redundancy.
- Cut 10.4.1 - difference between view and base table
- Table dimensions: 10.4.2 Counting columns and name reuse. Row counts are essential.
- Lintr package: https://cran.r-project.org/web/packages/lintr/index.html
-
Name
-
sqlpetr - package
- MIT license
-
sql-pet -- book repository
-
-
-
Todo
-
Mary Anne: small tasks that can be written in 30 min
- tidy data is 3rd normal form: basic SQL & data frames
-
Ian
- Simple queries - 1 table - using “rentals” table
-
Sophie
- SQL / dplyr comparison
- ERD diagram
- Check on a venue: Jessica Minnier or Ted Laderas for CLSB
-
Znmeb
- Collect object RAM usage tools
- Check out pgModeler on DVD rental database
- Create package repository (actually, if I do it it will be in my account unless we create a GitHub organization. So John should do this.)
-
JDS
- Revise chapter 10
- Reorganize repo (book, etc)
- Use Ed’s notes to Ian as a start to “how to use this book” page
-
DM
-
Format Introduction chapter similar to Hadley’s book structure.
-
Typo correction
-
Create Appendix to
- Introduction to SQL for people with beginners.
- ANSI standards: Oracle, MySQL, DB2, Microsoft equivalents
-
-