Skip to content

boyanangelov/oreilly_python_r_training

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python and R for Data Science (O'Reilly Live Online Training)

Setup

Python

If you don't have Python pre-installed on your machine, you can download it from python.org. After it's installed you need to set up a virtual environment:

python3 -m venv venv
source venv/bin activate

And then you can either install all the packages required with pip individually, or use the following command:

pip install -r requirements.md

To start using JupyterLab:

jupyter lab

If you want to use a text editor, a recommended tool is VSCode. While most of the R work will be in RStudio, we'll use JupyterLab for a small section, so you need to install the IRKernel. For that follow the official setup instructions.

R

You can download R from CRAN. After this you should install RStudio from the official website. After this create a new R Project (select use existing directory, use the one where you download this repository) and install the packages individually with install.packages().

Required packages

Python

  • jupyterlab (Python and R Kernels)
  • pandas
  • scikit-learn
  • yellowbrick
  • missingno
  • seaborn
  • opencv-python
  • scikit-image
  • nltk (download data as well)
  • spaCy (download model with python -m spacy download en_core_web_sm)
  • flask (optional)
  • mlflow (optional)

R

  • ggplot2
  • dplyr
  • prophet
  • xts
  • leaflet
  • shiny
  • flexdashboard (optional)
  • shinydashboard (optional)
  • sdmbench (optional)

Datasets

  • diamonds.csv (built-in from ggplot2)
  • boston housing (built-in from scikit-learn)
  • wine (built in from scikit-learn)
  • fires.csv (from Kaggle)
  • reviews.csv (from UCSD)
  • image_000001.jpg (flowers102 from University of Oxford)
  • starwars.csv (built-in from dplyr)
  • temps.csv (from Machine Learning Mastery Github)
  • quakes (built-in from RStudio)
  • data_for_ml.csv (intermediate dataset for case study - processed fires.csv)