Handson workshop for building a semantic search engine.
If you came to this repo, during a workshop visit this custom jupyter hub with all the dependencies already set up.
-
Data Fetching Internal notebooks that show how to fetch a dump of the Stack Overflow XML
-
Data Processing notebook Process the XML dump and save to smaller parquet files
-
Non Deep Learning Retrieval
Shows how to index and retrieve documents using ElasticSearch
- Deep Learning Retrieval
Show how to index and retrieves documents using a finetuned Deep Learning Retriever Link
Sample notebook for scross encoder taken from SentenceTransformer docs Link
- ANN Shows how to speed up Deep Learning retrieval by exploring different ANN indexes Link
[ODSC 2022 Slides][assets/slides_odsc2022.pdf)
For help or feedback, please reach out to :