This course is part of a series of modules for data science. This course assumes you have done the introduction in Python and something similar to the Data Analyses & Visualisation course https://github.com/raoulg/MADS-DAV
The lessons can be found inside the notebooks
folder.
The source code for the lessons can be found in the src
folder.
The book we will be using is Understanding Deep Learning. It is available as pdf here: https://udlbook.github.io/udlbook/ but it is highly recommended to buy the book.
├── README.md <- This file
├── .gitignore <- Stuff not to add to git
├── .lefthook.yml <- Config file for lefthook
├── pyproject.toml <- Human readable file. This specifies the libraries I installed to
| let the code run, and their versions.
├── data
│ ├── external <- Data from third party sources.
│ ├── processed <- The processed datasets
│ └── raw <- The original, raw data
│
├── models <- Trained models
│
├── notebooks <- Jupyter notebooks. Naming convention is xx_name_of_module.ipynb where
│ xx is the number of the lesson
├── presentations <- Contains all powerpoint presentations in .pdf format
├── references <- background information
| └── codestyle <- Some code Code style standards
| └── leerdoelen <- Learning goals per lesson, including pages to read and videos to watch
│
├── reports <- Generated analysis like PDF, LaTeX, etc.
└── figures <- Generated graphics and figures to be used in reporting
For this project you will need some dependencies.
The project uses python 3.10, and dependencies are defined within the pyproject.toml
file. You will also find requirements.lock
files, but they are generated for a Mac so they will miss cuda specific dependencies.
The .lefthook.yml
file is used by lefthook, and lints & cleans the code before I commit it. Because as a student you probably dont commit things, you can ignore it.
I have separated the management of datasets and the trainingloop code. You will find them as dependencies in the project:
Both of these will be used a lot in the notebooks; by separating them it is easier for students to use the code in your own repositories. In addition to that, you can consider the packages as "extra material"; the way the pacakges are set up is something you can study if you are already more experienced in programming.
- watch the introduction video about rye
- You skipped the video, right? Now go back to 1. and actually watch it. I'll wait.
- install rye with
curl -sSf https://rye.astral.sh/get | bash
run through the installer like this:
- platform linux: yes
- preferred package installer: uv
- Run a Python installed and managed by Rye
- which version of python should be used as default: 3.10
- should the installer add Rye to PATH via .profile? : y
- run in the cli:
source "$HOME/.rye/env"
run in the cli:
git clone https://github.com/raoulg/MADS-MachineLearning-course.git
git config --global user.name "Mona Lisa"
git config --global user.email "[email protected]"
cd MADS-MachineLearning-course/
rye sync
- copy your local ssh key, see github docs
cd ~/.ssh
nano authorized_keys
copy paste your key on the second line (leave the first key there) and save the file, then exit- check with
cat authorized_keys
that your key is added.
I know some of you still skipped the video. Ok, I get that, but now actually watch it... introduction video about rye