Explore semantic caching to reduce your OpenAI/LLM API bill
This repository contains a Python application that demonstrates the use of semantic caching in searching for similar questions in a cache. It compares the performance of two different embedding methods: OpenAI and ONNX.
- Streamlit web application to test and evaluate semantic caching.
- CLI for testing exact, semantic, and no cache.
- ONNX and OpenAI embeddings.
- FAISS search for fast similarity search.
To install this project, you need to have Python 3.10 installed. Then, follow these steps:
-
Clone the repository
-
Enter the project directory
-
Install the project:
poetry install
-
Set up your OpenAI API key in the
.env
file.
To run the CLI, use the following command:
poetry run cli run <cache_type>
Replace <cache_type>
with no_cache
, semantic_cache
.
To run the Streamlit web app, use the following command:
poetry run webapp
The app will be available at localhost:8501
.
pyproject.toml
: TOML file that contains the project metadata and dependencies.scripts/
: Folder containing the Streamlit app and CLI scripts.semantic_caching/
: Folder containing the core caching logic.cache/
: Folder to store cache files (FAISS indices and SQLite databases).
- langchain
- openai
- streamlit
- python-dotenv
- gptcache
- tiktoken
- rich
- torch
- typer
We welcome contributions to this project! Please feel free to submit issues or pull requests.
This project is licensed under the MIT License.