Skip to content

Latest commit

 

History

History
185 lines (114 loc) · 3.82 KB

lecture_07.md

File metadata and controls

185 lines (114 loc) · 3.82 KB

Lecture 7

Databases


Complexity

Carry-over from lecture 6


Packages

Whenever documentation says pip install …, you probably want to add it to your requirements.txt.


SQL

Who has experience with SQL? What have you used it for?




SQL syntax is largely the same across SQL databases:

  • SQLite
  • DuckDB
  • PostgreSQL
  • MySQL
  • BigQuery
  • Oracle

The variations are known as "dialects".



Example

  1. Download and unzip the access to electricity dataset as a CSV.

  2. Query it.

    SELECT * FROM read_csv('[path].csv');

SELECT * FROM read_csv(
   '[path].csv',
   nullstr=['', '..']
);

CREATE TABLE electricity AS SELECT ...;

What's a question we might want to ask?


Column names with spaces need double quotes. Alternatively, normalize_names.


SELECT
   "Country name",
   "2022 [YR2022]" - "1990 [YR1990]" AS diff
FROM electricity
ORDER BY diff DESC;

SQL similarities to pandas

  • Tabular
  • DuckDB: read_csv()
  • Tables are like DataFrames
  • Columns have types
  • Column-based operations
  • SELECT is like boolean indexing
  • GROUP BY is like groupby()

pandas allows you to build up operations over multiple lines; harder to do that in SQL.


Views


Clients


Lots of ways to connect to databases from Python, including:


Drivers

Allow you to use the same Python syntax across databases


Writing data

How would you take data from an API and get it into a database?






  1. In the Google Cloud Console, make sure your Project is selected.
  2. Open BigQuery.
  3. Enable the API.
  4. Open a public dataset.
    • Try Category of Economics and Price of Free

SELECT company_name, COUNT(*) AS num_complaints
FROM `bigquery-public-data.cfpb_complaints.complaint_database`
GROUP BY company_name
ORDER BY num_complaints DESC;

Readings

Coming soon