GitHub - siddharthkhonde/qqp-similarity: Quora Question Pair Similarity Prediction

Incorporated binary classification of the dataset to determine the probabilistic outcome using logarithmic-loss as the performance indicator and binary confusion matrix to determine errors.
Performed EDA using statistical methods to understand the data; Implemented basic feature engineering to increase interpretability.
Applied text preprocessing to remove HTML tags, punctuations, and stop-words using stemming; Executed advanced feature extraction to split and analyze the question similarity.
Visualized 15-D data in 2-D using T-SNE to get more detailed insights; Incorporated NLP techniques to convert words to vectors.
Trained the data using Logistic Regression, Linear SVM, and XGBoost to determine the expected outcome efficiently by comparing and minimizing errors.

Technologies: Pandas, Numpy, Matplotlib, Seaborn, Plotly, Sklearn, Nltk, Sqlite, Tqdm, WordCloud, KNN Classifier, Gaussian Naive Bayes, Logistic Regression, Random Forest Classifier, Linear SVM, XGBoost

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
qqp-similarity.ipynb		qqp-similarity.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

siddharthkhonde/qqp-similarity

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages