This repository contains the projects I am developing for classes I'm taking at
the DataCamp website.
Projects are divided by language: Python
, R
and SQL
.
The Rebrickable database includes data on every LEGO set that has ever been
sold; the names of the sets, what bricks they contain, what color the bricks
are, etc. It might be small bricks, but this is big data! In this project, you
will get to explore the Rebrickable database. To do this you need to know your
way around pandas
dataframes and it's recommended that you take a look at the
courses pandas Foundations and Manipulating DataFrames with pandas.
Click here to access a rendered version of the Jupyter notebook
In 1847, the Hungarian physician Ignaz Semmelweis makes a breakthough discovery:
He discovers handwashing. Contaminated hands was a major cause of childbed fever
and by enforcing handwashing at his hospital he saved hundreds of lives.
In this python project we will reanalyze the medical data Semmelweis collected.
This project assumes that you are familiar with python and pandas
DataFrames.
You can learn the required skills in these courses: Intermediate Python for
Data Science and pandas Foundations.
Click here to access a rendered version of the Jupyter notebook
To better understand the growth and impact of Bitcoin and other cryptocurrencies
you will, in this project, explore the market capitalization of different cryptocurrencies.
Warning: The cryptocurrency market is exceptionally volatile, and any money you put in
might disappear into thin air. Never invest money you can't afford to lose.
To complete this project, you need to be fluent with pandas DataFrames. Before
starting this project, we recommend that you have completed the following courses:
pandas Foundations, Manipulating DataFrames with pandas and Cleaning Data in Python
Click here to access a rendered version of the Jupyter notebook
Version control repositories like CVS, Subversion or Git store rich evolution information about a software project. In this project, you'll be challenged to read in, clean up and visualize a real world Git repository dataset of the Linux kernel. With almost 700k commits and thousands of contributors (find out the exact number in this project ;-) ) there are some little data cleaning and wrangling challenges that you'll encounter. But you'll also gain insights about the development activities over the last 13 years. For this Project, you need to be familiar with Pandas DataFrame
s, especially the read_csv
and groupby
functions, as well as working with time series data.
Click here to access a rendered version of the Jupyter notebook
R is a tool for doing serious statistics and data analysis. But not everything
in life can be serious, life is also beautiful, and R can make beautiful things
too. R can make art.
The arrangement of leaves on a plant stem is ruled by spirals. This fact is
called phyllotaxis and it is a nice example of how mathematics can describe
patterns in nature. In this project, we will invent flowers using this fact.
This R project assumes you have familiarity with the ggplot2
package. If you
don't know ggplot2
we recommend you take either of the courses Introduction
to the Tidyverse or Data Visualization with ggplot2 (Part 1). If you
want to see more examples of how you can use R to make art, you should check
out the Fronkonstin blog created by Antonio Sánchez Chinchón.
Click here to access a rendered version of the Jupyter notebook
When beginning a career in data science, one often wonders what programming tools and languages are being used in the industry, and what skills one should learn first. By exploring the 2017 Kaggle Data Science Survey results, you can learn about the tools used by 10,000+ people in the professional data science community. Before starting this project, you should be comfortable manipulating data frames and have some experience working with the tidyverse
packages dplyr
, tidyr
, and ggplot2
. This project uses a subset of the 2017 Kaggle Machine Learning and Data Science Survey dataset. If you want to know more about the tools and techniques Kaggle participants use, check out the full report of the Kaggle 2017 survey results.
Click here to access a rendered version of the Jupyter notebook
It's not that we humans only take debts to manage our necessities. A country may also take debt to manage its economy. For example, infrastructure spending is one costly ingredient required for a country's citizens to lead comfortable lives. The World Bank is the organization that provides debt to countries. In this project, you are going to analyze international debt data collected by The World Bank. The dataset contains information about the amount of debt (in USD) owed by developing countries across several categories. You are going to find the answers to questions like: - What is the total amount of debt that is owed by the countries listed in the dataset? - Which country owns the maximum amount of debt and what does that amount look like? - What is the average amount of debt owed by countries across different debt indicators? The data used in this project is provided by The World Bank. It contains both national and regional debt statistics for several countries across the globe as recorded from 1970 to 2015.
Click here to access a rendered version of the Jupyter notebook
An important part of business is planning for the future and ensuring that the company survives changing market conditions. Some businesses do this really well and last for hundreds of years. In this project, you'll explore data from BusinessFinancing.co.uk on the world's oldest businesses: when they were founded and which industries they belong to.
Click here to access a rendered version of the Jupyter notebook