This is a collection of Data Science projects I have worked on.
Machine learning is a branch of artificial intelligence (AI) and computer science that focuses on using data and algorithms to mimic how humans learn and improve accuracy over time. Machine learning works all around us. As we interact with our banks, shop online, and use social media, machine learning algorithms come into play to make our experience efficient, smooth and secure.
Food is an essential need for man's survival. Everyone of us at certain points has either had a best food or cravings for certain kinds of foods and would most likely go to any length to get it either by cooking or eating out. Some of the objectives we considered in thise project include; What food customers like the most? Does weather influence what people eat? What is the preferred mode of transportation used by the restaurant? What time do people pick their orders most? What is the correlation between days of the week with respect to the type of vehicle used to deliver food and the distance covered]?
This is a restaurant case study on amazon business research. Collaborated with 8 other young women on this group project. The notebook on this research has more details and answers the needed qustions. Also included is a presentation file.
This is task is on prediction using supervised machine learning to predict the percentage of scores of a student based on the no. of hours. This is a simple linear regression task as it involves just 2 variables(i.e, hours and scores). Dataset used was obtained from http://bit.ly/w-data
Upon loading the dataset and plotting the distribution scores, it is observed that there is a positive linear relationship between the no. of hours studied and percentage scored. The prediction shows that the percentage score of a student who studies for 9.25 hours is 92.7%
This task is on prediction using unsupervised machine learning to predict the optimum number of clusters and represent it visually. This is a K-means clustering algorithm task. The dataset used was obtained from https://bit.ly/3kXTdox
For K-means clustering, the elbow method was used. The optimum number of clusters identified was 3. Data points points were visualized in different clusters with 3 different colours. Matplotlib was used to visualize the data. IDE used is Jupyter Notebook
This task is on prediction of prices of different cars on a car price dataset. Collaborated with 8 other young women on this project. IDE used is Jupyter Notebook.
This task is on training a machine learning model to predict the winner of the 2023 UCL final clash between Manchester City and Inter Milan. It extracts match results from the UEFA Champions League, UCL on FBref.com (a website devoted to tracking statistics for football teams and players from around the world) using web scraping, clean and visualize the dataset obtained from UCL stats using pandas. The UCL also known as the UEFA Champions league is one of the most prestigious tournaments in all of sports. It is a soccer tournament of 32 teams that compete in five rounds for the right to be crowned the best club in European soccer. The project processes include data extraction, data cleaning and training the model. IDE used is the Jupyter Notebook. Also a chatbot called Predictbot was trained to have a football conversation with a football enthusiast using VS Code IDE. The json file and python files have been attached.
This folder contains data analysis and model building for a dataset spanning from 2007 to 2009, focusing on penguins from the Torgersen, Biscoe, and Dream islands. The journey was nothing short of amazing! The process involved a series of crucial steps: Data Extraction, Data Cleaning, Data Analysis, Data Interpretation, and model building. Penguins never cease to amaze, and through data analysis, more of their captivating world was unconvered. Find the data analysis and model building notebooks, as well as the dataset used in this folder.
Exploratory Data Analysis (EDA) is a method for better understanding your data and assisting in subsequent data preprocessing. It is the critical process of conducting preliminary data investigations to discover patterns, identify anomalies, test hypotheses, and validate assumptions using summary statistics and graphical representations. Exploratory Data Analysis (EDA) is a step in the Data Analysis Process that employs a variety of techniques to better understand the dataset under consideration. Exploratory Data Analysis accomplishes two major tasks:
It aids in the cleanup of a dataset. It improves your understanding of the variables and their relationships. EDA assists in gathering insights and making better sense of data, as well as removing irregularities and unnecessary values from data, aids in preparing dataset for analysis, enables a machine learning model to predict dataset more accurately. It also aids in the selection of a better machine learning model.
This project extracts, cleans and visualizes the dataset from a sample superstore. IDE used is Jupyter notebook. The link to the dataset used in this task was obtained from https://bit.ly/3i4rbWl
This project extracts, cleans, analysis and visualizes the dataset of global terrorism. The link to the dataset used in this task was obtained from https://bit.ly/2TK5Xn5 Ms Excel was used in analysis and and a dashboard was created for the visualization of the dataset. The Excel workbook and a pdf file have also been attached.
Data analysis involves analyzing data and drawing conclusions based on the data. The main goal of data visualization is to make it easier to identify patterns and trends in a given dataset.
This task is on data analysis and data visualization on a fintech dataset. The objective of this task was to use visualizations to represent insights. PowerBI was used for analysis and visualization of the dataset. The PowerBI raw file and a pdf file(for presentation purposes) have been attached.
Collaborated with a team of young women on a mental health product for women in Africa. The objective of this task was to analyse and visualize data gotten from respondents of a survey. Power BI was used for analysis and visualization of the dataset. he PowerBI raw file and a pdf file(for presentation purposes) have been attached.
Hackathons are a fantastic opportunity to learn new skills, where participants develop new approaches to solving problems or completing large projects that require collaboration.
Collaborated with a teammate to explore the dataset, formulate a research focus which was "How The FIFA Women's World Cup Game Has Evolved", analyze the data, create a compelling visualization, and presented the research in a powerpoint presentation. The Python programming language was used to handle the data cleaning processes, and steps were documented in a Python notebook. PowerBI was used to derive analytical insights, visualization, and to design the dashboard.
A copy of the python notebook, PowerBI rawfile, dashboard in form of a poster, and a powerpoint presentation file have been attached.
This folder hosts the DataFest Africa Datathon project. This project extracts, cleans and visualizes the dataset obtained from an online payment platform using pandas. The dataset represents transactions and user-related data collected over time from our platform. Fraud detection is a set of activities undertaken to prevent money or property from being obtained through false pretenses. Fraud detection is essential for companies to safeguard their customers' transactions and accounts by detecting fraud before or as it happens. The primary goal of the project is to develop an advanced predictive model to identify potentially fraudulent transactions. The project processes include; Data Extraction; Data Cleaning; Data Exploration and Analysis; Feature Engineering; Model Evaluation; Fine tuning
This is a four-month-long program to get started with machine learning engineering. This folder hosts all projects and homework for the ML Zoomcamp. Here is the GitHub course repository; http://mlzoomcamp.com
The python Data Analysis Course is a comprehensive course with real-world hands-on projects on the Udemy learning platform. The link to the course; https://www.udemy.com/course/bigdata-analysis-python/?couponCode=NEW1_SEPT9_PROJECTS
This contains the weather data analysis project including the dataset used. The weather dataset is a time-series data set with per-hour information about the weather conditions at a particular location.
This contains the car data analysis project including the dataset used. The car dataset is a trove of information about various cars, including details like make, model, engine size, number of cylinders and much more.
This contains the police data analysis project. The police dataset contains data from a police checkpost.
This contains the covid-19 data analysis project. It contains a small dataset of Covid-19, just for understaning purposes. The data used here is till 29-April-2020 and has records as on 29-April-2020.
This contains the London housing data analysis project. It contains a dataset is primarily centered around the housing market of London with relevant data such as monthly average house prices, yearly number of houses sold and monthly number of crimes committed.
This contains the India census data analysis project. The dataset originates from the 2011 India Census for each district and is presented in CSV format, sourced from Kaggle. It encompasses information on the total population, demography, literacy, districts, states, workers, religion, education, and age in India.
This contains the Udemuy Course data analysis project. This dataset provides a comprehensive compilation of course-related information, encompassing a wide array of subjects available on the Udemy platform. It includes detailed data on courses spanning diverse topics, ensuring a thorough representation of the educational offerings within the Udemy ecosystem.