Skip to content

Latest commit

 

History

History
38 lines (28 loc) · 4.76 KB

Syllabus.md

File metadata and controls

38 lines (28 loc) · 4.76 KB

Data Analysis Using Python

Introduction

Data Science and analysis is playing the most significant role today covering every industry in the market.For e.g finance,e-commerce,business,education,government. Now organizations play a 360-degree role to analyze the behavior and interest of their customers to make decisions in favor of them. Data is analyzed through a programming language such as python which is one of the most versatile languages and helps in doing a lot of things through it. Netflix is a pure data science project that reached the top by analyzing every single interest of their customers. Keywords: Data Visualization, AnacondaJupyter Notebook, Exploratory Data Analysis, Machine Learning.

Duration - 30 Hrs.

Content

Day Topic Name Sub Topics Duration
1 Introduction to Data and Data Analysis Using Python Introduction to Data
Types of Data in Statistics (Numerical & Categorical)
Types of data in real world
Python Introduction
2.5 hrs.
2 Introduction to Python & Conditional Statements Literate Programming
Jupyter Notebook Environment
Markdown format for documentation
Python basics
Operators in Python
Conditional Statements in python
2.5 hrs.
3 Loop and Data Structures in Python Iterations
Strings
String Functions,String Slicing
Python Data Structures
Lists
List Methods
Tuples
Tuple Methods
2.5 hrs.
4 File, Packages and Functional Programming Dictionaries
Dictionary Methods
File Handling
Packages and Modules
List & Dictionary Comprehension
2.5 hrs.
5 Data Manipulation with NumPy Introduction
NumPy Arrays
NumPy Basics
Math
Random
Indexing
2.5 hrs.
6 Introduction to Pandas and Pandas Series Filtering
Statistics
Aggregation
Saving Data
Introduction
Series
2.5 hrs.
7 Data Analysis with pandas DataFrame
Combining
Indexing
File I/O
Grouping
Features
Filtering
Sorting
statistics
Plotting
2.5 hrs.
8 Data Preprocessing with Scikit-Learn Introduction
Standardizing Data
Data Range
Robust Scaling
Normalizing Data
Data Imputation
2.5 hrs.
9 Cleaning Data in Python Working with Duplicates and Missing Values
Which values should be replace with missing values based on data
Identifying and Eliminating Outliers
Dropping duplicate data
Filling missing data
Applying on raw dataset and introduction to Kaggle and other data sources
2.5 hrs.
10 Introduction to Data Visualization and Matplotlib Introduction to Visualization and Python packages
Matplotlib history
Introduction to plotting
Line Plot
Scatter Plot
Bar Graph
Histogram
Pie Chart
Box Plot
Tasks
2.5 hrs.
11 Data Visualization using Seaborn Using Seaborn Styles
Setting the default style
Color Palettes
Creating Custom Palettes
stripplot() and swarmplot()
boxplots, violinplots and lvplots
barplots, pointplots and countplots
2.5 hrs.
12 Data Visualization using Seaborn Using Seaborn Styles
Setting the default style
Color Palettes
Regression Plots
Binning data
Matrix plots
Creating heatmaps
2.5 hrs.

Course Objectives

The main goal of this course is to help students or Faculty to learn, understand, and practice data analysis and machine learning approaches, which include the study of modern computing data technologies and scaling up machine learning techniques focusing on industry applications. Mainly the course objectives are conceptualization and summarization of Data Analysis and machine learning computing technologies, machine learning techniques, and scaling up machine learning approaches.

Entry Requirements (Pre-requisites)

Students must have Knowlege on Python Programming and Statistics.

Hardware Requirements

  • i3 or above Processor Laptop/Desktop is required
  • 4 GB or above RAM is recommended
  • Good Internet Connectivity
  • OS-Windows 10 is Preferable