-
Notifications
You must be signed in to change notification settings - Fork 0
/
Cleaning Basics with R
90 lines (56 loc) · 2.4 KB
/
Cleaning Basics with R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
# Data Cleaning Basics with R
## Step 1: Load Packages
### Install and load required packages
install.packages("tidyverse")
install.packages("skimr")
install.packages("janitor")
install.packages("lubridate") ### Additional package for date manipulation
### Load the installed packages
library(tidyverse)
library(skimr)
library(janitor)
library(lubridate)
### Verify loaded packages
search()
## Step 2: Import Data
### Import CSV data from "hotel_bookings.csv" and create a data frame
bookings_df <- read_csv("hotel_bookings.csv")
## Step 3: Explore Data
### View the first few rows of the dataset
head(bookings_df)
* As another option, I could also use 'tails' to view the last few rows of the dataset.
### Summarize the structure of the dataset
str(bookings_df)
### Preview the data using the 'glimpse' function
glimpse(bookings_df)
### Get column names
colnames(bookings_df)
## Step 4: Cleaning Data
### Handling Missing Data
#### Identify Missing Values
### Create a logical matrix for missing values
missing_values <- is.na(bookings_df)
### Summarize missing value percentages
missing_percentage <- colMeans(missing_values) * 100
### Display missing value percentages
print(missing_percentage)
### Handle missing values (example: removing rows with missing 'children' values)
bookings_df_clean <- na.omit(bookings_df)
## Step 5: Remove Duplicates
### Remove duplicate rows
bookings_df_clean <- bookings_df_clean[!duplicated(bookings_df_clean), ]
## Step 6: Save the Clean Data
### Specify the file path for the cleaned data
cleaned_data_file <- "GJs_Data_Projects.csv"
### Save the cleaned data to a CSV file
write.csv(bookings_df_clean, file = cleaned_data_file, row.names = FALSE)
### Confirm the successful saving of the cleaned data
if (file.exists(cleaned_data_file)) {
cat("Cleaned data saved successfully to", cleaned_data_file, "\n")
} else {
cat("Error: Failed to save cleaned data to", cleaned_data_file, "\n")
}
* Cleaned data saved successfully to GJs_Data_Projects.csv!
# Conclusion
In this project, I explored data cleaning basics with R. The proejct involves fixing missing data, removing duplicates, and saving cleaned data. As both an analyst and a musician, I recognize the importance of quality and excellence, whether it's in the realm of data analysis or the world of music.
Thank you for taking your time to follow along with me in this project. Stay tuned for more data-related endeavors!:)