A toolkit for building, searching, and evaluating search systems using blog posts from isaacflath.com. This is intended as a starter or demo app for educational purposes.
This is a companion repo for the following blog posts:
This project provides tools to:
- Create search indices from blog post content (lancedb vectors and bm25 corpus)
- Search blog posts using various retrieval methods (including a hybrid search -> re-ranking search)
- Evaluate and compare search results through an interactive web application
Note: This is deployed to railway here. But it's not designed for multiple users as it's a demo, so if too many people are there, it'll probably be super slow!
The main.py
script provides a web interface for testing and evaluating search results:
python main.py
Features:
- Interactive search interface with multiple retrieval methods
- Relevance rating system (1-5 scale)
- Notes and annotations for search results
- Historical evaluation tracking and comparison
rendered_posts/
: Contains blog posts from isaacflath.com in both HTML and Markdown formats.create_search_index.py
uses these, and they are provided for your experimentation.
The create_search_index.py
script processes blog posts and creates search indices:
python create_search_index.py [--rerun]
Options:
--rerun
: Force regeneration of embeddings and indices (otherwise skips if they already exist)
Outputs:
blog_search.db
: LanceDB database containing vector embeddingsbm25_corpus.pkl
: Pickled corpus for BM25 keyword search
The search_blog.py
script provides a command-line interface for searching blog posts:
python search_blog.py "your search query" [--top-k N] [--method METHOD]
Options:
--top-k N
: Number of results to return (default: 3)--method METHOD
: Search method to use (choices: vector, keyword, hybrid, rerank; default: rerank)
Search methods:
vector
: Dense retrieval using sentence embeddingskeyword
: Sparse retrieval using BM25 algorithmhybrid
: Combined vector and keyword searchrerank
: Two-stage retrieval with cross-encoder reranking
- Clone the repository
- Install dependencies
pip install -r requiements.txt
- Use python
main.py
and navigate tolocalhost:5001
to run the app. - Run
create_search_index.py
to build search indices - Use
search_blog.py
for command-line searching