The following libraries should be installed for running the code in this project:
- pandas
- plotly
- scipy
My goal with this project was to find some insight on how a popular book should look like, and I decided to do this by exploring Goodreads data. More specifically, I wanted to answer the following questions:
- Are the format and the size of a book relevant to its popularity?
- What are the most popular genres?
- How are genres related to each other?
There is a single notebook in this project which contains all the code used for preparing the data, doing the analysis and setting up the visualizations. The notebook also includes markdown cells and comments that detail each step and explain some decisions in the process.
The most insightful results of this analysis were put together in this post.
Thanks to Manav Dhamani for collecting the data and making it available easily for us! The data and other descriptive information can be found at Kaggle. Feel free to use the code here as you would like!