pclub_secy_task_04

First part of the task was to find 5 most relevant paragraphs based on the question, and then find the piece of text which best answered the question if it did. Approach

Preprocessed the data removing some of the problems in the dataset like some columns were shifted to the right or broken off in between.
Used FAISS to find top k paragraphs which matched the question because it provides quick search on basis of similarity search.
Used cross encoder : ms-marco-MiniLM-L-6-v2 which reordered the top k paragraphs on basis of context.
Used ELECTRA + SQUAD 2.0 model to extract answers from the paragraph. This was chosen because lot of corpus of data was taken from the SQUAD dataset, so model was already trained on major of them.

This provides quick answer and search of queries across the big corpus.

Results:

It is clearly visible how good results are FAISS + Cross Encoder giving in finding top paragraphs.

Final question answering from the retrieved paragraph:

FineTuning the models on current data would have produced much much better results, but due to time and resource constraint, I wasn't able to do so.

How to Run:

Update the path variables in para_finder.ipynb, comment out 3rd last cell, and pass the query as run_search(str(query_text), num_results_to_print) where query_text is the question and num_results_to_print is number of top paragraphs needed.
run final_integrated to get the answer. Pretrained weights for final_integrated: https://drive.google.com/file/d/1ubh0X_o1sdgmZyIdqiuFXsZEd726QyDA/view?usp=sharing (quantized.pt)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
compressed_array.npz		compressed_array.npz
compressed_dataframe.csv.gz		compressed_dataframe.csv.gz
final-integrated.ipynb		final-integrated.ipynb
para_finder.ipynb		para_finder.ipynb
para_finder_primitve.ipynb		para_finder_primitve.ipynb
paragraphs.xlsx		paragraphs.xlsx
script1.py		script1.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pclub_secy_task_04

How to Run:

About

Releases

Packages

Languages

aayush01x/pclub_secy_task_04

Folders and files

Latest commit

History

Repository files navigation

pclub_secy_task_04

How to Run:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages