Skip to content

My ACM Research submission. Data Preprocessing using Pandas & Matplotlib.

License

Notifications You must be signed in to change notification settings

CSheppardCodes/coding-challenge-2022-fall

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

ACM Research coding challenge (Fall 2022)

CHRISTOPHER SHEPPARD

This semester's challenge is especially open-ended. Here is a dataset on Kaggle called "CarsForSale". It contains data scraped from the online car marketplace Cars.com. Each row contains 25 pieces of information about a car's listing, such as its price, year, model, and color.

The challenge is to do something interesting with the data. Can you find a pattern, answer a question, or create a visualization?

3 step proccess

1. Clean Data

2. Find Patterns in Data

3. Visualize Useful information

1.1 DATA

Dispay the data types and columns

1.2 PRICE TO INT

Convert Price string to an int for future processing

image

RESULT: Price is now a float 64

image

1.3 Used/New

Convert all certifications to certified

image

RESULT: Now only Used and Certified are the only unique values

image

2.1 ConsumerRating

Find topmost variables correlated with ConsumerRating RESULT: CustomerRating Mean is 4.702762961382547

image

image

RESULT: These 4 columns have a high correlation to consumer rating ValueForMoneyRating 0.917873,

ReliabilityRating 0.914597,

ComfortRating 0.860040,

PerformanceRating 0.805849,

2.2 All CORRELATIONS

Display all correlations

image

image

2.3 PRICE CORRELATIONS

Display all price correlations

image

RESULT: Price has some correlations with Year and an anti-correlated to Mileage

2.4 MAKE

Find most common car sold

image

3.1 HISTOGRAMS

Display all availible in histograms

image

3.2 VALUE TYPES

Display final data types

image

3.3 HOT ENCODING

Initiate Pandas Hot encoding for future predictive modeling

image

About

My ACM Research submission. Data Preprocessing using Pandas & Matplotlib.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%