Pgvector template (langchain-ai#13267)

Including pvector template, adapting what is covered in the [cookbook](https://github.com/langchain-ai/langchain/blob/master/cookbook/retrieval_in_sql.ipynb). --------- Co-authored-by: Lance Martin <[email protected]> Co-authored-by: Erick Friis <[email protected]>
daniel-cohere · Nov 14, 2023 · 58f5a4d · 58f5a4d
1 parent be85422
commit 58f5a4d
Show file tree

Hide file tree

Showing 9 changed files with 2,114 additions and 0 deletions.
diff --git a/templates/sql-pgvector/.gitignore b/templates/sql-pgvector/.gitignore
@@ -0,0 +1 @@
+__pycache__
diff --git a/templates/sql-pgvector/LICENSE b/templates/sql-pgvector/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2023 LangChain, Inc.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/templates/sql-pgvector/README.md b/templates/sql-pgvector/README.md
@@ -0,0 +1,105 @@
+# sql-pgvector
+
+This template enables  user to use `pgvector` for combining postgreSQL with semantic search / RAG. 
+
+It uses [PGVector](https://github.com/pgvector/pgvector) extension as shown in the [RAG empowered SQL cookbook](cookbook/retrieval_in_sql.ipynb)
+
+## Environment Setup
+
+If you are using `ChatOpenAI` as your LLM, make sure the `OPENAI_API_KEY` is set in your environment. You can change both the LLM and embeddings model inside `chain.py`
+
+And you can configure configure the following environment variables
+for use by the template (defaults are in parentheses)
+
+- `POSTGRES_USER` (postgres)
+- `POSTGRES_PASSWORD` (test)
+- `POSTGRES_DB` (vectordb)
+- `POSTGRES_HOST` (localhost)
+- `POSTGRES_PORT` (5432)
+
+If you don't have a postgres instance, you can run one locally in docker:
+
+```bash
+docker run \
+  --name some-postgres \
+  -e POSTGRES_PASSWORD=test \
+  -e POSTGRES_USER=postgres \
+  -e POSTGRES_DB=vectordb \
+  -p 5432:5432 \
+  postgres:16
+```
+
+And to start again later, use the `--name` defined above:
+```bash
+docker start some-postgres
+```
+
+### PostgreSQL Database setup
+
+Apart from having `pgvector` extension enabled, you will need to do some setup before being able to run semantic search within your SQL queries.
+
+In order to run RAG over your postgreSQL database you will need to generate the embeddings for the specific columns you want. 
+
+This process is covered in the [RAG empowered SQL cookbook](cookbook/retrieval_in_sql.ipynb), but the overall approach consist of:
+1. Querying for unique values in the column
+2. Generating embeddings for those values
+3. Store the embeddings in a separate column or in an auxiliary table.
+
+## Usage
+
+To use this package, you should first have the LangChain CLI installed:
+
+```shell
+pip install -U langchain-cli
+```
+
+To create a new LangChain project and install this as the only package, you can do:
+
+```shell
+langchain app new my-app --package sql-pgvector
+```
+
+If you want to add this to an existing project, you can just run:
+
+```shell
+langchain app add sql-pgvector
+```
+
+And add the following code to your `server.py` file:
+```python
+from sql_pgvector import chain as sql_pgvector_chain
+
+add_routes(app, sql_pgvector_chain, path="/sql-pgvector")
+```
+
+(Optional) Let's now configure LangSmith. 
+LangSmith will help us trace, monitor and debug LangChain applications. 
+LangSmith is currently in private beta, you can sign up [here](https://smith.langchain.com/). 
+If you don't have access, you can skip this section
+
+
+```shell
+export LANGCHAIN_TRACING_V2=true
+export LANGCHAIN_API_KEY=<your-api-key>
+export LANGCHAIN_PROJECT=<your-project>  # if not specified, defaults to "default"
+```
+
+If you are inside this directory, then you can spin up a LangServe instance directly by:
+
+```shell
+langchain serve
+```
+
+This will start the FastAPI app with a server is running locally at 
+[http://localhost:8000](http://localhost:8000)
+
+We can see all templates at [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)
+We can access the playground at [http://127.0.0.1:8000/sql-pgvector/playground](http://127.0.0.1:8000/sql-pgvector/playground)  
+
+We can access the template from code with:
+
+```python
+from langserve.client import RemoteRunnable
+
+runnable = RemoteRunnable("http://localhost:8000/sql-pgvector")
+```