This project has two parts that demonstrate the importance and value of data visualization techniques in the data analysis process. In the first part, I used Python visualization libraries to systematically explore a Ford GoBike System Dataset, starting from plots of single variables and building up to plots of multiple variables. In the second part, I produced a short presentation that illustrates interesting properties, trends, and relationships that I discovered in the Ford GoBike System Dataset.
This data set includes information about individual rides made in a bike-sharing system covering the greater San Francisco Bay area. In this dataset there are 183412 records and It has total 16 columns which are:
- duration_sec
- start_time
- end_time
- start_station_id
- start_station_name
- start_station_latitude
- start_station_longitude
- end_station_id
- end_station_name
- end_station_latitude
- end_station_longitude
- bike_id
- user_type
- member_birth_year
- member_gender
- bike_share_for_all_trip
- There are 75 records who there age are more than 100.
- The most hours that people use bikes are are 8am and 5pm with more than 20,000..
- The most weekdays that people use bikes are Thursday and Tuesday..
- All the trips recorded in Febraury.
- Most of the trips have duration from 5 to 10 minutes.
- The most users are Subscribers with 91%, whereas Customer 9%.
- The most users who use bikes, their age from 20 to 50.
- The Customers use bikes for longer trips than the Subscribers.
The presentation will highlight the most weekdays that people use bikes, the most hours that people use bikes and the duration for most of the trips. Also, It focuses on the age of the most popular users who use bikes and the relationships of the following features: weekday, duration_in_min, user_type.