Skip to content

siddharthkhonde/qqp-similarity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

  1. Incorporated binary classification of the dataset to determine the probabilistic outcome using logarithmic-loss as the performance indicator and binary confusion matrix to determine errors.
  2. Performed EDA using statistical methods to understand the data; Implemented basic feature engineering to increase interpretability.
  3. Applied text preprocessing to remove HTML tags, punctuations, and stop-words using stemming; Executed advanced feature extraction to split and analyze the question similarity.
  4. Visualized 15-D data in 2-D using T-SNE to get more detailed insights; Incorporated NLP techniques to convert words to vectors.
  5. Trained the data using Logistic Regression, Linear SVM, and XGBoost to determine the expected outcome efficiently by comparing and minimizing errors.

Technologies: Pandas, Numpy, Matplotlib, Seaborn, Plotly, Sklearn, Nltk, Sqlite, Tqdm, WordCloud, KNN Classifier, Gaussian Naive Bayes, Logistic Regression, Random Forest Classifier, Linear SVM, XGBoost

About

Quora Question Pair Similarity Prediction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published