This project focuses on cleaning and standardizing the Nashville Housing dataset. The SQL queries address issues such as standardizing date formats, populating missing property addresses, breaking down addresses into individual columns, and normalizing 'SoldAsVacant' values. The cleaning process involves handling duplicates and removing unused columns for a more refined dataset.
- Converts the 'SaleDate' column to a standardized date format.
- Populates missing property addresses by updating null values based on ParcelID.
- Splits the 'PropertyAddress' column into 'PropertySplitAddress' and 'PropertySplitCity'.
- Uses
parsename
to split 'OwnerAddress' into 'OwnerSplitAddress', 'OwnerSplitCity', and 'OwnerSplitState'.
- Updates the 'SoldAsVacant' column to replace 'Y' with 'Yes' and 'N' with 'No'.
- Identifies and displays duplicate records based on specific columns.
- Drops columns 'OwnerAddress', 'TaxDistrict', 'PropertyAddress', and 'SaleDate' to streamline the dataset.
- Clone the repository:
git clone https://github.com/SaiSurajMatta/Nashville-Housing-Data-Cleaning-Project
- Execute the SQL queries in the specified order on your SQL environment.
- Explore the cleaned Nashville Housing dataset with refined structure and standardized values.
Feel free to contribute, suggest improvements, or adapt the queries for similar data-cleaning tasks.
Dataset Link: https://www.kaggle.com/datasets/tmthyjames/nashville-housing-data
- The project utilizes the Nashville Housing dataset.