Welcome to Lisbon Data Science Academy Batch 4 Students repository!
Here is you'll find all information needed to setup your environment and the workflow you'll use during the academy.
IMPORTANT Before the bootcamp you will have to complete these instructions, this is essential.
Once you complete the setup mark yourself as such on this spreadsheet.
By completing this you will setup and learn about all the tools you'll be using during the academy. We will also be able to identify any problems in time to figure out a solution.
This section deals with setting up either Windows Subsystem for Linux (WSL) or VMWare. If you are using MacOS or Linux you can skip this section.
If you are using windows 10 we suggest using WSL (see below), if you are using an older Windows version we also support running a virtual linux machine with VMWare.
Why do I need to install either WSL or VMware?
Because of the differences in command line syntax between Windows vs Mac OS/Linux, it would be a great challenge for us to support and provide instructions for both Operating Systems. For this reason, we’d ask you to install Windows Subsystem for Linux, or VMware, which would enable you to run Linux command lines inside Windows. Keep in mind that these are simply extensions to your Windows operating system, hence, installing this software will not do any changes on your laptop. It is also quick to do so.
Follow this guide if you are running Windows 10.
If you are running an older version of Windows (such as Windows 8 or 7), follow the guide below on running Ubuntu with Windows using VMware Player. You'll be required to download VMware and Ubuntu 18, for that please use the links provided below (not the links provided in the tutorial).
- VMware download link
- Ubuntu download link
- Follow this guide: How To Run Ubuntu in Windows 7 with VMware Player
You'll now need to install a couple of packages, which can be done in a terminal by running:
sudo apt update && sudo apt upgrade && sudo apt install python3-pip python3-venv
Some of the steps in the following sections will require Homebrew for MacOS. Homebrew will make it easier to install software that we will use later on. To open the terminal, choose one:
- In Finder , open the /Applications/Utilities folder, then double-click Terminal.
- By pressing cmd + space then type
terminal
and press enter.
The terminal should now be open:
Copy and paste the following line in the terminal:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"
You may be offered to install the Command Line Developer Tools, confirm and once it's finished continue installing Homebrew by pressing enter again.
You will need to install python, this can be done in a terminal by running:
brew install python
So you're using Ubuntu, hun? Well, kudos to you.
You just need to install a couple of packages, which can be done in a terminal by running:
sudo apt update && sudo apt upgrade && sudo apt install python3-pip python3-venv
Bellow are the instructions that are enough to get the setup done and get you up and running :) You can also follow this guide for a more in depth set of instructions that accomplish exactly the same thing.
You should always be using a virtual environment to install python packages. We'll use venv to set them up.
To install and update packages, we'll be using pip which is the reference Python package manager.
python3 -m pip install --user --upgrade pip setuptools wheel
- Create a virtual environment with the name
slu00
python3 -m venv ~/.virtualenvs/slu00
- Activate the environment
source ~/.virtualenvs/slu00/bin/activate
Note: after you activate your virtual environment you should see at the leftmost of your command line the name of your virtual environment surrounded by parenthesis, like this:
mig@my-machine % source ~/.virtualenvs/slu00/bin/activate
(slu00) mig@my-machine %
And you're able to make sure your virtual environment is active using the which
command (it outputs the location of your virtual environment's python installation):
(slu00) mig@my-machine % which python
/Users/mig/.virtualenvs/slu00/bin/python
Now update pip.
(slu00) pip install -U pip
Having a GitHub account and knowing the basics of committing and pushing changes are mandatory. By the end of this setup you will have accomplished both. Complete the following steps:
- Sign up for a GitHub account if you don't already have one.
- Checking for existing SSH keys
- Generating a new SSH key and adding it to the ssh-agent
- Adding a new SSH key to your GitHub account
- Testing your SSH connection
It's good practice to store your work with version control. In this academy that is a requirement as it is how you will make your work available to us.
- Log into GitHub
- Create a new private GitHub repository called batch4-workspace, see
Creating a new repository.
IMPORTANT The repo MUST be named batch4-workspace!
If you name it anything else, you will be unable to submit any of your work!
- You need to explicitly select Private - This is your work and nobody else's. You will be graded based upon the merits of what you are able to do here so this should not be open to the world while you are working on it. Maybe after the course is completed, you can open-source it but not this time.
- Initialize with a README. This is mostly just so that you don't initialize an empty repo.
- Add a Python
.gitignore
. This step is insanely important. If you don't do this, you may end up checking things into the repo that make it un-gradeable by our grading system. ADD THE.gitignore
PLEASE!!!! <--- 4 *!
isn't enough
Since the repository is private you will have to explicitly give access so that our grading system can fetch the repository. To do this you will be adding a deploy key to the repository, which we provide to you in our Portal.
- Head on to the Portal
- Log in with your GitHub account
- Go to your profile and
copy the deploy key (including the
ssh-rsa
part) - Go back to the repository you have just created
- Go to
Settings > Deploy Keys
- Click "Add deploy key" (no need to grant Write Access)
- Give it a recognizable name like "grader" and paste the key from the Portal
-
Open a Terminal or Git Bash, the next steps are on this terminal
-
Clone your
<username>/batch4-workspace
repositoryIf you're not sure where to clone the repository in, you can create a
~/projects
folder, and clone it there
git clone [email protected]:<username>/batch4-workspace.git
You will be cloning the batch4-students repository. All of the learning material you need will be made available on this repo as the academy progresses.
- Open a Terminal or Git Bash, the next steps are on this terminal
- Clone the students repository batch4-students
git clone https://github.com/LDSSA/batch4-students.git
Or if you have your ssh keys set up:
git clone [email protected]:LDSSA/batch4-students.git
In the batch4-students
repository that you just cloned there is a sample
learning unit.
It's used to give instructors guidelines to produce the learning units.
We are also using it to ensure that you are able to run and submit a learning
unit.
So go ahead and copy the sample directory sample/SLU00 - LU Tutorial
from the batch4-students
repository to your repository (named batch4-workspace
).
The grader only requires you to have the contents in a directory starting with the learning unit's ID, but we highly advise to keep the same directory structure as the students repository. All learning units are organized as:
<specialization ID> - <specialization name>/<learning unit ID> - <learnin unit name>
Doing so will help you keep organized and ease copying data from the students repository to yours.
All learning units come as a set of Jupyter Notebooks (and some links to presentations). Notebooks are documents that can contain text, images and live code that you can run interactively.
In this section we will launch the Jupyter Notebook application. The application is accessed through the web browser.
Once you have the application open feel free to explore the sample learning unit structure. It will give you a handle on what to expect and what rules the instructors follow (and the effort they put) when creating a learning unit.
So let's start the Jupyter Notebook app:
-
Activate your virtual environment
source ~/.virtualenvs/slu00/bin/activate
-
Enter the Learning unit directory in your workspace directory (
batch4-workspace
).Note: It is VERY IMPORTANT that you ALWAYS work on the files on your
batch4-workspace
repository, and NEVER work on files that are in yourbatch4-students
repository!cd ~/projects/batch4-workspace/sample/"SLU00 - LU Tutorial"
-
Installing the necessary packages
pip install -r requirements.txt
-
Run the jupyter notebook
Windows 10 note: if you are running Windows 10 with WSL, the command to run the jupyter notebook is:
jupyter notebook --NotebookApp.use_redirect_file=False
jupyter notebook
When you run the jupyter notebook
command, you should see something similar to this in your terminal:
Your browser should pop up with Jupyter open, however, if this does not happen, you can simply copy the link you see on your terminal (the one that contains localhost
) and past it in your browser's address bar:
Note: If you see these scarry looking error messages, don't worry, you can just ignore them.
Make sure you open and go through the Learning Notebook first.
Every learning unit contains an exercise notebook with exercises you will work on. So let's have a look at the sample Learning Unit.
- On the Jupyter Notebook UI in the browser open the exercise notebook
- Follow the instructions provided in the notebook
Besides the exercises and the cells for you to write solutions you will see
other cells with a series of assert
statements.
This is how we (and you) will determine if a solution is correct.
If all assert
statements pass, meaning you dont get an AssertionError
or
any other kind of exception, the solution is correct.
Once you've solved all of the notebook we recommend the following this simple checklist to avoid unexpected surprises.
- Save the notebook (again)
- Run "Restart & Run All"
- At this point the notebook should have run without any failing assertions
If you want to submit your notebook before it is all the way done to check intermediate progress, feel free to.
If you are able to go through the entire process and get a passing grade on the sample LU you'll have a good understanding of the same flow that you'll use for all LUs throughout the academy.
Now you have worked on the sample learning unit and you have some uncommitted
changes.
It's time to commit the changes, which just means adding them to your batch4-workspace
repository history, and pushing this history to you remote on GitHub.
- Using the terminal commit and push the changes
git add .
git commit -m 'Testing the sample notebook'
git push
- Go to the Portal and select the learning unit
- Select "Grade"
- After grading is complete you should have 20/20
- If everything passes locally but the grader doesn't give you the excepted output head to out troubleshooting
- Once you have your grade don't forget to do the spreadsheet thing.
You will need to follow this workflow whenever new learning materials are released.
Learning units will be announced in the academy's #announcements channel. At this point they are available in the batch4-students repository. A new Learning Unit is released every Monday, and its solutions are then released the next Monday.
The steps you followed during the initial setup are exactly what you are going to be doing for each new Learning Unit. Here's a quick recap:
-
Once a new Learning Unit is available, pull the changes from the batch4-students repo:
- enter the
~/projects/batch4-students/
using thecd
command, then use thegit pull
command:
cd ~/projects/batch4-students/ git pull
- enter the
-
Copy the Learning Unit to your
batch4-workspace
repocp -r ~/projects/batch4-students/"<specialization ID> - <specialization name>"/"<learning unit ID> - <learnin unit name>" ~/projects/batch4-workspace/"<specialization ID> - <specialization name>"
For example, for the
S01 - Bootcamp and Binary Classification
andSLU01 - Pandas 101
, it would look like this:cp -r ~/projects/batch4-students/"S01 - Bootcamp and Binary Classification"/"SLU01 - Pandas 101" ~/projects/batch4-workspace/"S01 - Bootcamp and Binary Classification"
-
Create a new virtual environment for the Learning Unit you'll be working on.
- To do this you will run the following command:
python3 -m venv ~/.virtualenvs/<learning unit ID>
- and you would replace the
<learning unit ID>
with the learning unit ID, such that for SLU01, for example, the command would be:
python3 -m venv ~/.virtualenvs/slu01
-
Activate your virtual environment
source ~/.virtualenvs/slu01/bin/activate
-
Install the python packages from requirements.txt for the specific Learning Unit (you must do this for each Learning Unit, and there are multiple Learning Units in a Specialization)
pip install -r ~/projects/batch4-workspace/"<specialization ID> - <specialization name>"/"<learning unit ID> - <learnin unit name>"/requirements.txt
For example, for the
S01 - Bootcamp and Binary Classification
andSLU01 - Pandas 101
, it would look like this:pip install -r ~/projects/batch4-workspace/"S01 - Bootcamp and Binary Classification"/requirements.txt
-
Change to the
batch4-workspace
dircd ~/projects/batch4-workspace
-
Open Jupyter Notebook
jupyter notebook
-
Work
-
Once all tests pass or once you're happy, save your work, close the browser tab with the Jupyter Notebook, close the terminal and open a new terminal
-
Then commit the changes and push
cd ~/projects/batch4-workspace git add . git commit -m "Worked on SLU01 exercises" git push
-
Profit
As much as we try and have processes in place to prevent errors and bugs in the learning units some make it through to you. If the problem is not in the exercise notebook you can just pull the new version from the students repo and replace the file. The problem is if the correction is in the exercise notebook, you can't just replace the file your work is there and you'll lose it!
When a new version of the exercise notebook is released (and announced) two things will happen. If you submit an old version of the notebook it will be flagged as out of date and not graded. You will have to merge the work you've already done into the new version of the notebook.
At the moment our suggestion to merge the changes is:
- Rename the old version
- Copy the new exercise notebook over
- Open both and copy paste your solutions to the new notebook
We understand it's not ideal and are working on improving this workflow using nbdime. If you are comfortable installing Python packages you can try it out, but we offer no support for this at the moment.
During the academy you will surely run into problems and have doubts about the material. We provide you with some different channels to ask for help.
If you feel something is not clear enough or there is a bug in the learning material please follow these steps. Remember, there is no such thing as a dumb question, and by asking questions publicly you will help others!
If you have more conceptual questions about the materials or how to approach a problem you can also reach out to the instructors on slack. You can find the main contact for the learning unit in the Portal this instructor can help you out or redirect you to someone that is available at the moment.
Are you getting different results locally than what you are getting in the Portal? If so we will first ask to do a bit of troubleshooting.
- Ensure that you have saved the changes in the notebook
- Ensure that you have committed and pushed the changes
- Ensure that you are not using packages that are not present in the original
requirements.txt
file (changes to this file or your local environment have no effect) - In the learning unit page in the Portal you are able to download the exercise notebook with the results of the grader by clicking your grade, have a look to figure out what went wrong. If none of these steps helped go ahead and open a support ticket for the portal here.
Is the Portal down or acting out in some unexpected way? Then please open a support ticket for the portal here.
- When I open Windows Explorer through Ubuntu it goes to a different folder than in the guide
- Ubuntu on Windows 10 high CPU usage crashes
- When I pull from the
batch4-students
repository I get an error - When I try to open
jupyter notebook
I get an error - When I use the
cp
command the>
sign appears and the command does not execute - My problem is not listed here what should I do?
Please make sure:
- you are running the command
explorer.exe .
including the dot at the end. - you are running Windows 10 version
1909
or newer.
- First please make sure you are running Windows 10 version
1909
or newer. - Then, try following these steps
error: Your local changes to the following files would be overwritten by merge:
<some files>
Please commit your changes or stash them before you merge.
Aborting
git is telling us that changes were made by you to the files on the ~/projects/batch4-students
folder, and is not pulling the changes made by the instructors because they would override the changes that you made there. To fix this do the following:
-
make sure that any change you made to the files on
~/projects/batch4-students
(that you want to not lose) is saved in your~/projects/batch4-workspace
repository (seehttps://github.com/LDSSA/batch4-students#updates-to-learning-units
for how to do this), and if you don't want to keep the changes you made to these files, just continue on to the next step -
go to the
~/projects/batch4-students
folder and run:cd ~/projects/batch4-students git stash
-
now you can pull from the
batch4-students
repository:git pull
migs-MBP% jupyter notebook
zsh: command not found: jupyter
Before opening jupyter notebook
activate your virtual environment:
source ~/.virtualenvs/slu00/bin/activate
cp -r ~/projects/batch4-students/"S01 - Bootcamp and Binary Classification"/"SLU01 - Pandas 101" ~/projects/batch4-workspace/"S01 - Bootcamp and Binary Classification"
>
Make sure to use this type of quotes "
and not these ones “
.
If the above steps didn't solve the problem for you, please contact us on Slack or if you are not on slack, open an issue
If your problem doesn't fit in any of the previous categories head over to slack and ask. Someone will surely point you in the right direction.
If you're looking for some specific part of our organization head over to the Member Directory and search for the area of responsibility you're looking for.