@@ -28,7 +28,7 @@ The trick is avoiding duplicates. Your script might then need to say something l
28
28
## Lab work
29
29
30
30
- You'll write methods to load continuously updated data into a database.
31
- - You'll set up scripts to perform each of the [ methods of data loading] ( #data-loading ) into DuckDB.
31
+ - You'll set up scripts to perform each of the [ methods of data loading] ( #data-loading ) into DuckDB.
32
32
- You'll [ pair] ( ../docs/pairing.md ) in your Lab group.
33
33
- Work on branches and submit pull requests for the chunks of work — you decide what the "chunks" are.
34
34
@@ -40,13 +40,13 @@ The trick is avoiding duplicates. Your script might then need to say something l
40
40
- We have monthly observations (rows) and monthly vintages (columns)
41
41
42
42
| DATE | PCPI04M1 | PCPI04M2 | PCPI04M3 |
43
- | --------- | ---------: | ---------: | ---------: |
44
- | 2003:09 | 185.0 | 185.1 | 185.1 |
45
- | 2003:10 | 185.0 | 184.9 | 184.9 |
46
- | 2003:11 | 184.6 | 184.6 | 184.6 |
47
- | 2003:12 | 185.0 | 184.9 | 184.9 |
48
- | 2004:01 | #N/A | 185.8 | 185.8 |
49
- | 2004:02 | #N/A | #N/A | 186.3 |
43
+ | ------- | -------: | -------: | -------: |
44
+ | 2003:09 | 185.0 | 185.1 | 185.1 |
45
+ | 2003:10 | 185.0 | 184.9 | 184.9 |
46
+ | 2003:11 | 184.6 | 184.6 | 184.6 |
47
+ | 2003:12 | 185.0 | 184.9 | 184.9 |
48
+ | 2004:01 | #N/A | 185.8 | 185.8 |
49
+ | 2004:02 | #N/A | #N/A | 186.3 |
50
50
51
51
- A revision of past data is released in February of each year.
52
52
- A revision released in year ` t ` can update the values in years ` t-5 ` to ` t-1 ` .
@@ -58,9 +58,9 @@ The trick is avoiding duplicates. Your script might then need to say something l
58
58
Suppose your organization wants to maintain a database of CPI data
59
59
60
60
- Write a ` get_latest_data ` function that accepts a ` pull_date ` and returns the latest data available up to that date
61
- - For example, if the ` pull_date ` is 2004-01-15, the function should return the data from vintage ` PCPI04M1 `
61
+ - For example, if the ` pull_date ` is 2004-01-15, the function should return the data from vintage ` PCPI04M1 `
62
62
- Write code that pulls the latest data at a given ` pull_date ` and loads it into a DuckDB database
63
- - You will implement each of the methods ` append ` , ` trunc ` , and ` incremental `
63
+ - You will implement each of the methods ` append ` , ` trunc ` , and ` incremental `
64
64
- Loop over a range of ` pull_dates ` to simulate running the scripts on a daily basis
65
65
- Compare the performance of each method (consistency and speed)
66
66
@@ -79,12 +79,12 @@ Suppose your organization wants to maintain a database of CPI data
79
79
- ` _append `
80
80
- ` _trunc `
81
81
- ` _inc `
82
- - Your code should accept a ` pull_date ` parameter and load the data up to that date
83
- - The script should be able to run multiple times without duplicating data
84
- - For incremental: a Python script may be easier than a SQL one
82
+ - Your code should accept a ` pull_date ` parameter and load the data up to that date
83
+ - The script should be able to run multiple times without duplicating data
84
+ - For incremental: a Python script may be easier than a SQL one
85
85
4 . On a notebook: simulate your organization running the scripts on a daily basis.
86
86
- Start from empty tables
87
87
- Loop over a range of ` pull_dates ` (e.g. 2000-01-01 to 2025-02-28) to simulate running the scripts on a daily basis.
88
88
- If the loop takes way too long, use a shorter range
89
- - Compare the performance of each method (data consistency and speed)
90
- 5 . [ Submit the links to the pull request(s) via CourseWorks.] ( https://courseworks2.columbia.edu/courses/210480/assignments )
89
+ - Compare the performance of each method (data consistency and speed)
90
+ 5 . [ Submit links to the pull request(s) via CourseWorks.] ( https://courseworks2.columbia.edu/courses/210480/assignments )
0 commit comments