ZOMATO-RESTAURANT-CLUSTERING-AND-SENTIMENT-ANALYSIS

Problem Statement Zomato is an Indian restaurant aggregator and food delivery start-up founded by Deepinder Goyal and Pankaj Chaddah in 2008. Zomato provides information, menus and user-reviews of restaurants, and also has food delivery options from partner restaurants in select cities.

India is quite famous for its diverse multi cuisine available in a large number of restaurants and hotel resorts, which is reminiscent of unity in diversity. Restaurant business in India is always evolving. More Indians are warming up to the idea of eating restaurant food whether by dining outside or getting food delivered. The growing number of restaurants in every state of India has been a motivation to inspect the data to get some insights, interesting facts and figures about the Indian food industry in each city. So, this project focuses on analysing the Zomato restaurant data for each city in India.

The Project focuses on Customers and Company, you have to analyze the sentiments of the reviews given by the customer in the data and made some useful conclusion in the form of Visualizations. Also, cluster the zomato restaurants into different segments. The data is vizualized as it becomes easy to analyse data at instant. The Analysis also solve some of the business cases that can directly help the customers finding the Best restaurant in their locality and for the company to grow up and work on the fields they are currently lagging in.

This could help in clustering the restaurants into segments. Also the data has valuable information around cuisine and costing which can be used in cost vs. benefit analysis

Data could be used for sentiment analysis. Also the metadata of reviewers can be used for identifying the critics in the industry.

Attribute Information Zomato Restaurant names and Metadata Name : Name of Restaurants

Links : URL Links of Restaurants

Cost : Per person estimated Cost of dining

Collection : Tagging of Restaurants w.r.t. Zomato categories

Cuisines : Cuisines served by Restaurants

Timings : Restaurant Timings

Zomato Restaurant reviews Restaurant : Name of the Restaurant

Reviewer : Name of the Reviewer

Review : Review Text

Rating : Rating Provided by Reviewer

MetaData : Reviewer Metadata - No. of Reviews and followers

Time: Date and Time of Review

Pictures : No. of pictures posted with review

Methods Used

Descriptive Statistics
Data Visualization
Machine Learning -Supervised-Learning -Unsupervised-Learning

Technologies

Python
Pandas
Numpy
Matplotlib
Seaborn
Scikit-learn
NLP

Sentiment analysis

Used Sentiment Intensity Analyzer (vaderSentiment) and RoBerta Pretrained Model -Plotted the distribution of ratings to have an understanding of the proportion of good and bad reviews. -Made many visualizations which include, top 10/bottom 10 restaurants in terms of average rating. -Had a clear understanding of the cost summary of the restaurants. -Done pre-processing such as removing emojis, punctuations and only used Adjectives and verbs to reduce dimensionality.

Clustering

-Did the pre-processing, Clubbed some cuisines together so that one-hot encoding would be possible. -Removed the unwanted variables and Normalized the data. -Clustered the data using K-means as well as Hierarchical clustering.

K Means Clustering

K-means clustering is a widely used unsupervised machine learning technique.Unsupervised algorithms make inferences from datasets using only input vectors and do not relate to known or tagged outcomes. A cluster is a collection of data points that have been grouped together due to particular commonalities. K, for the number of centroids required in the dataset. The centroid is an imaginary or real point that represents the cluster's center.By lowering the in-cluster sum of squares, each data point is assigned to one of the clusters.

To put it another way, the K-means algorithm finds k centroids and then assigns each data point to the closest cluster while keeping the centroids as small as possible. The 'means' in K-means refers to the data being averaged, or determining the centroid. K means is highly vulnerable to high dimensional data. As the dimensions increase it is not able to differentiate between the clusters. Hence, a dimensionality reduction technique called Principal Component Analysis was used.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
README.md		README.md
ZOMATO_RESTAURANT_CLUSTERING_AND_SENTIMENT_ANALYSIS.ipynb		ZOMATO_RESTAURANT_CLUSTERING_AND_SENTIMENT_ANALYSIS.ipynb
Zomato Restaurant names and Metadata.csv		Zomato Restaurant names and Metadata.csv
Zomato Restaurant reviews.csv		Zomato Restaurant reviews.csv
Zomato_web_Scraping_using_Beautiful_Soup_.ipynb		Zomato_web_Scraping_using_Beautiful_Soup_.ipynb
Zomato_web_Scrapping_using_Beautiful_Soup_.ipynb		Zomato_web_Scrapping_using_Beautiful_Soup_.ipynb
new_meta.csv		new_meta.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ZOMATO-RESTAURANT-CLUSTERING-AND-SENTIMENT-ANALYSIS

Methods Used

Technologies

Sentiment analysis

Clustering

K Means Clustering

About

Releases

Packages

Languages

NaithaniAR/CapstoneProject-MLClustering

Folders and files

Latest commit

History

Repository files navigation

ZOMATO-RESTAURANT-CLUSTERING-AND-SENTIMENT-ANALYSIS

Methods Used

Technologies

Sentiment analysis

Clustering

K Means Clustering

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages