point8 · ccauet · Oct 9, 2023 · Oct 9, 2023 · Oct 9, 2023 · Oct 18, 2023
diff --git a/notebooks/data-science-learning-paths-concept.png b/notebooks/data-science-learning-paths-concept.png
diff --git a/notebooks/data-science-learning-paths.ipynb b/notebooks/data-science-learning-paths.ipynb
@@ -80,6 +80,20 @@
     "- **Index notebook**: [📓Machine Learning on Time Series](index/mlts-machine-learning-time-series.ipynb)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Advanced Python Data Science Ecosystem [APE]\n",
+    "\n",
+    "A 2-day advanced course on independent topics, including developement of python packages with poetry, object-oriented programming, introduction in data bases, dashboards with streamlit, and introduction of the polars library. \n",
+    "\n",
+    "- **Level**: Advanced\n",
+    "- **Duration**: 2 days\n",
+    "- **Prerequisites**: DAP+MLP\n",
+    "- **Index notebook**: [📓Advanced Python Data Science Ecosystem](index/ape-advanced-python-ds-ecosystem-2day.ipynb)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},

diff --git a/notebooks/db/RDB_example.png b/notebooks/db/RDB_example.png
diff --git a/notebooks/db/data/firmenlauf_demo.db b/notebooks/db/data/firmenlauf_demo.db
diff --git a/notebooks/db/data/participants.csv b/notebooks/db/data/participants.csv
@@ -0,0 +1,5 @@
+First Name;Last Name;Shoe Size;Shirt Size;Distance;Team;
+Anna;Einstein;38;38;5;3;
+Marius;Fermi;44;60;2;5;
+James;Pauli;44;42;10;8;
+Selma;Meitner;41;40;10;3;
diff --git a/notebooks/db/data/teams.csv b/notebooks/db/data/teams.csv
@@ -0,0 +1,4 @@
+ID;Size;Shoe Color;;;;
+3;16;Red;;;;
+5;15;Green;;;;
+8;11;Purple;;;;
diff --git a/notebooks/db/data/training.csv b/notebooks/db/data/training.csv
@@ -0,0 +1,10 @@
+ID;Date(YYYY-MM-DD);Time(mm:ss);Distance(km);Runner;;
+1;2023-07-15;39:00;4.5;Anna;;
+2;2023-08-05;58:00;3;Marius;;
+3;2023-08-07;34:45;1.6;James;;
+4;2023-07-08;32:00;4.05;Selma;;
+5;2023-07-18;35:00;4.5;Anna;;
+6;2023-07-25;30:00;4.5;Anna;;
+7;2023-09-07;37:00;5.456;Selma;;
+8;2023-07-19;41:51;2.24;James;;
+9;2023-07-28;32:06;1.6;James;;
diff --git a/notebooks/db/db-pandas-sql.ipynb b/notebooks/db/db-pandas-sql.ipynb
@@ -0,0 +1,261 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "9f37e58c-94cf-4930-8ad9-1f724740083f",
+   "metadata": {},
+   "source": [
+    "# Pandas + SQL(Alchemy)\n",
+    "\n",
+    "Pandas is a very powerful tool to work with data frames.\n",
+    "But it can also be used with databases!\n",
+    "We can load single tables from an existing database into dataframes or create new tables from dataframes, without specifying any schema!\n",
+    "SQLAlchemy is doing it under the hood.\n",
+    "\n",
+    "Working with Pandas and SQL will always load tables into a dataframe, we do *not* get any Python objects as we did with the ORM when using SQLAlchemy.\n",
+    "\n",
+    "Be aware that dataframes do *not* know about any relations you might have established with SQLAlchemy!\n",
+    "\n",
+    "**Be aware**: Working with SQL+Pandas is usually only a comfortable workaround for simple use cases and \"quick-and-dirty\" approaches, e.g. when you need a simple lookup of some data. Also, if your amout of data is feasible for a dataframe and you plan to load the data once from a DB and do everything else in Pandas anyway, then this workflow will do.\n",
+    "For more complex tasks involving joining/aggregating/grouping/selecting data on a large data volume, you might want to rely on the features of your (relational) DB itself and do all these tasks using SQLAlchemy."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "92f25dbc-afab-4413-881e-0a8c9c6787c2",
+   "metadata": {},
+   "source": [
+    "# Read from a DB with Pandas"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0112bdf6-91e2-444a-b344-aefe7f185b7f",
+   "metadata": {},
+   "source": [
+    "## Open Connection\n",
+    "\n",
+    "First, we have to establish a connection to the DB we have already filled.\n",
+    "In this example, we use the Sqlite DB-API for this task.\n",
+    "You can do the same with other systems, e.g. PostgreSQL, using the respective DB-API."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "890c3e23-b4e8-4a5c-a8b1-1d9b36b2a688",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import sqlite3"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1ab98ccf-b753-4b3a-a24c-2093aacc3b04",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "connection = sqlite3.connect(\"data/firmenlauf_demo.db\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "070ba9db-9c33-4380-8e85-a9dcc43c3583",
+   "metadata": {},
+   "source": [
+    "## Run an SQL queries with Pandas"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "766d3ccf-b612-48a7-96f3-2866b98e7dbb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load the whole table \"teams\" into a dataframe\n",
+    "df_teams = pd.read_sql(\"SELECT * FROM teams\", connection)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "927e0dad-675f-4b05-af81-30be0cd7c544",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_teams"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4100ce2e-11e1-476d-9375-192ae74c5de0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# For better readability, we define the query string separately\n",
+    "# Note, that we have to JOIN two tables explicitly in SQL if we want to combine data from two tables\n",
+    "sql_query_runner_shoe_color = \"\"\"\n",
+    "    SELECT runners.first_name, runners.shoe_size, teams.shoe_color \n",
+    "    FROM runners\n",
+    "    JOIN teams\n",
+    "    ON runners.team_id = teams.id\n",
+    "\"\"\"\n",
+    "\n",
+    "df_shoes = pd.read_sql(sql_query_runner_shoe_color, connection)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "30c95984-57ff-447d-adaf-f18f8c4a1f19",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "df_shoes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3a03cd89-70ca-4a38-919f-9c966db4548b",
+   "metadata": {},
+   "source": [
+    "# Add a Table to a DB with Pandas\n",
+    "\n",
+    "Let's say we want to add a new table containing the ranking from the actual Firmenlauf and the money each team gets.\n",
+    "We first create a dataframe and add this dataframe as new table to the DB.\n",
+    "Note, that we can not add any relationships as we did when using SQLAlchemy, since we do not use an ORM here."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4ae82b51-6ab6-4552-bac1-937ceb11a8dc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_ranking = pd.DataFrame({\"rank\": [1, 2, 3, 4], \"team_id\": [4, 3, 2, 1], \"prize\": [5000, 2000, 1000, 500]})"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "eaa43902-2fd7-4ceb-a1cb-1804b39a0dcc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_ranking"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d95e6c3f-cd0b-4bba-b674-a443d83e93c7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Import the dataframe as table to the DB and replace it, if it already exists (this might cause data loss in real world!).\n",
+    "df_ranking.to_sql(\"rankings\", connection, if_exists=\"replace\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "eedfced1-3a67-4f97-a253-f3996658f988",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Read out the newly added table as dataframe again\n",
+    "pd.read_sql(\"SELECT * FROM rankings\", connection)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1cc2e5d0-c7da-4ae0-a00e-d2c2a247348b",
+   "metadata": {},
+   "source": [
+    "# Show DB schema information"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "650c92e5-d1af-4168-9ff4-15e7be36dea1",
+   "metadata": {},
+   "source": [
+    "The `sqlite_master` element contains all information of the DB schema. \n",
+    "\n",
+    "We want to have a more structured output of the available tables and their columns therefore we define the following function:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b51d21e5-daa5-4060-8043-f20d19a0151e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def table_info(c, conn):\n",
+    "    '''\n",
+    "    prints out all of the columns of every table in db\n",
+    "    c : cursor object\n",
+    "    conn : database connection object\n",
+    "    '''\n",
+    "    tables = c.execute(\"SELECT name FROM sqlite_master WHERE type='table';\").fetchall()\n",
+    "    for table_name in tables:\n",
+    "        table_name = table_name[0] # tables is a list of single item tuples\n",
+    "        table = pd.read_sql_query(\"SELECT * from {} LIMIT 0\".format(table_name), conn)\n",
+    "        print(table_name)\n",
+    "        for col in table.columns:\n",
+    "            print('\\t' + col)\n",
+    "        print()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b0de9d90-3eff-47be-8556-98cd03f9fde4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cur = connection.cursor()\n",
+    "table_info(cur, connection)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d3eb6810-a3f2-47e3-ad55-5ecf82c8f100",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "_This notebook is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). Copyright © [Point 8 GmbH](https://point-8.de)_"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}