Skip to content

CRT outreach program to assist life scientists in learning basic data science skills.

Notifications You must be signed in to change notification settings

Genomics-CRT/Data-Science-For-Life-Science

Repository files navigation

Workshop Outline

The workshop will follow this weekly schedule for the duration of the course:

  • Monday: A markdown file containing learning materials for the topics. These are essentially textbook style markdown documents that have code examples inline.
  • Monday: A R markdown + python file worksheet. This is a file you can download and open in R Studio and Spyder respectively. It will contain exercises for participants to complete based on the markdown file released on the same day.
  • Thursday: Zoom meeting. Course coordinators will participate in a conference call where they will share their screen with participants. Solutions + recordings of the tutorial will be posted asap
  • Thursday: Solutions to the worksheet are provided after the zoom meeting to those who could not attend.

Week 1 is an anomaly, dedicated to installing software. The zoom meeting will mostly be an informal meet and greet, if participants have issues or queries they can be addressed in week 1 zoom meeting only.

Troubleshooting Issues

For the duration of the course, participants are encouraged to use the Github "issues" tab to post code queries or issues. This is by far the easiest way for course coordinators to debug issues you may be having. Debugging issues over zoom is not beneficial for all participants and is time consuming for course coordinators.

When an issue has been solved, it will be marked "closed" and will not appear in the open issues tab. Check the 'closed' issues tab for archived posts.

Week 1 (13/04/2020)

Installing Dependencies

Working under the assumption that most participants have either a MacOS or a Windows operating system, it is crucial for each user to have access to the same software. For this we have decided to use Anaconda, a package manager deployable across Windows, Mac & Linux systems. The first week of the Workshop will be dedicated to making sure each participant has a fully working version of Anaconda.

A tutorial on how to install Anaconda is available here.

Creating a Github account

Github is a free website where users can access code repositories, and create their own repositories to store code, notes and small files of data (max 100MB). As the workshop is being conducted via github, we strongly encourage participants to create a github account.

A guide on how to set up a Github account, navigate the website and workshop repository is available here.

Linux Subsystem (Windows Users)

A late addition to the workshop, we have decided to cover UNIX shell scripting in week 7. Participants with Linux or Mac OS systems will not need to follow this installation step, as both systems are derivatives of UNIX, sharing core libraries and applications like GNU tools.

For windows 10 users, you can install Windows Sub-system for Linux (WSL). This distribution consists of a Linux environment compiled through Windows and enables most native command-line tools, utilities and binaries from Linux to run on Windows: the users can now run Bash scripts and all popular Linux command-line tools like sed, awk, grep, sort, apt, ssh and others. This will allow most participants to engage with week 7 shell scripting exercises.

A tutorial for windows users to install WSL has been prepared here

If you don't have windows 10, an alternative installation of Cygwin is offered here

Zoom meeting

  • Thursday 16th 2-3pm.

Week 2 (20/04/2020)

Introduction to R (part 1)

A gentle introduction to R Studio and the R programming language, covering the basic syntax of R. Topics covered include data structures in R, creating + calling variables, logical operators, conditional statements, vectors, functions, for loops, while loops and loading packages in R.

Resources are will be relased on 20/04/2020.

Introduction to Python (part 1)

A gentle introduction to Spyder GUI and the Python programming language, covering the basic syntax of Python. Topics covered include data structures in Python, creating + calling variables, logical operators, conditional statements, for loops,and while loops.

Resources will be released on 20/04/2020.

Zoom meeting

  • Thursday 23rd 2-4pm

Week 3 (27/04/2020)

Introduction to R (part 2)

Working with matrices in R, reading text/csv files into dataframes and performing maniuplations, operations and subsetting using base R functions.

Introduction to Python (part 2)

Working with dataframes in Python using numpy and pandas libraries. Perform operations and tasks on the dataframes and write to files.

Zoom meeting

Thursday 30th 2-4pm.


Week 4 (04/05/2020)

Introduction to R (part 3)

This tutorial covers creating plots using base R and is extended to cover the ggplot package. Further packages for visualizations are provided in the teaching materials.

Introduction to Python (part 3)

This tutorial covers creating plots in Python, using the popular matplotlib library and the increasingly popular seaborn library.

Zoom meeting

Thursday 7th 2-4pm


Week 5 (11/05/2020)

Math and Statistics (R)

This tutorial has 3 parts:

1) Descriptive statistics for single variable and multivariable datasets including measures of central tendency, variability and quantiles, along with distributions, the Central Limit Theorem and confidence interval for the mean.

2) Hypothesis testing in the form of 2-samples comparison (the t-test, non-parametric test) and correlation tests (parametric and non-parametric)

3) Linear regression analysis

Math and Statistics (Python)

This tutorial is split into 3 parts and covers:

1) Descriptive statistics for single variable and multivariable datasets including measures of central tendency, variability and quantiles, along with distributions, the Central Limit Theorem and confidence interval for the mean.

2) Hypothesis testing in the form of 2-samples comparison (the t-test, non-parametric test) and correlation tests (parametric and non-parametric)

3) Linear regression analysis

Zoom meeting

Thursday 14th 2-4pm

2-3pm - Python part

3-4pm - R part

Week 6 (18/05/2020)

Machine Learning (R)

Distance metrics, clustering methods in unsupervised machine learning, visualised as dendograms and heatmaps. Dimensionality reduction using PCA, visualising Principal components in bi plots. Supervised machine learning covering data pre-processing and cleaning, creating training and test sets and implementing KNN, RF and Elastic net machine learning models.

Machine Learning (Python)

Zoom meeting

About

CRT outreach program to assist life scientists in learning basic data science skills.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published