A CLI demo app that uses Oso to authorize context vector embeddings before sending them to a RAG chatbot.
- Docker
- Node.js
- An OpenAI API key to generate embeddings and chatbot responses.
- NOTE: The requests to OpenAI will incur charges, but they're low. For reference, I made 53 requests to the API while testing this app and incurred USD $0.01 in charges.
- Supabase local environment (installed via npm/docker)
- Oso Dev Server (installed via docker)
- Docker Desktop
- node.js
- Oso Dev Server
docker run -p '8080:8080' public.ecr.aws/osohq/dev-server:latest
- Local Supabase environment
npx supabase init npx supabase start
- An OpenAI API Key
SSH
git clone [email protected]:osohq/oso-rag-chatbot.git
HTTPS
git clone https://github.com/osohq/oso-rag-chatbot.git
cd oso-rag-chatbot
npm install
export DATABASE_URL="postgresql://postgres:postgres@localhost:54322/postgres"
export OPENAI_API_KEY=YOUR_OPENAI_API_KEY
npx supabase db reset
npm run initialize
npm run start
The app models a company chatbot. It can accept context from internal company documents, which are stored in the Supabase PostgreSQL database you created above.
The chatbot is aware of two users:
- Bob: An engineering employee who received a bad review. Part of the review feedback was from Alice, a coworker who said that he's "horrible to work with."
- Diane: An HR employee who has access to everyone's review data
The chatbot can answer two questions from with the context that's available in the database:
- Why did Bob get a bad review?
- When are the company holidays?
When Bob asks the chatbot "Why did Bob get a bad review?", he should only see the generalized feedback from the review his manager shared with him.
When Diane asks the chatbot "Why did Bob get a bad review?", she should see both the generalized feedback and Alice's pointed remark.
Any user can see the company holidays.
There are two required environment variables and three optional ones:
DATABASE_URL
: (required) The connection string to the postgresql instance that contains the chatbot data. If you're using the default Supabase setup, then set this topostgresql://postgres:postgres@localhost:54322/postgres
OPENAI_API_KEY
: (required) The API key used for OpenAI LLM operations. You can get a key here after creating an account.OSO_URL
: (optional) The URL of the Oso Cloud instance used for authorization. Defaults tohttp://localhost:8080
OSO_AUTH
: (optional) The API key used for Oso Cloud operations. Defaults to the Local Dev Server key:e_0123456789_12345_osotesttoken01xiIn
DEBUG
: (optional) Used to emit various debugging information. Unset by default (no debug logging). Can be set to any combination of the following as a comma-delimited string:main
: Emit the context sent to the chatbot as plain text.authz
: Emit information about authorization operations.data
: Emit information about database operations.llm
: Emit information about LLM operations.embedding
: Emit the vector embeddings generated by OpenAI.
The application supports setting environment variables with dotenv. If you place a file named .env
in the repository root directory, the app will recognize any environment variables you define in that file.
The .env
file is listed in .gitignore
, so it won't be committed to version control. It's safe to put your API keys and database connection information in it.
When you start the chatbot with npm start
, it will ask you who you are. Enter Bob
or Diane
❯ npm start
> start
> node cli.js start
? Who are you? Diane
Next, it will prompt you for a question. Enter one of the following:
- Why did Bob get a bad review?
- When are the company holidays?
? What would you like to ask? Why did Bob get a bad review?
The response from the chatbot will depend on your identity and the question you asked. For example, when Diane asks about Bob's review, the response will be something like this:
It seems that Bob received a bad review primarily due to feedback from Alice, who mentioned that he is "horrible to work with." This suggests there are significant issues related to his behavior or collaboration skills in the workplace. Additionally, it was pointed out that Bob should work on being more collaborative, which indicates that perhaps he isn't engaging effectively with his team or contributing positively to group dynamics. Furthermore, there's a suggestion that he needs to contribute more to design and architecture discussions, meaning that his input in critical areas of project development may be lacking. Overall, it looks like the feedback highlights a need for improvement in teamwork and participation in discussions. If you have any more questions about the review process or how to provide support for Bob, feel free to ask!
By setting the DEBUG
environment variable, you can emit various information about what the chatbot is doing. This can be useful if you'd like more insight into what's going on under the hood without adding a bunch of console.log()
statements. For example, if you set DEBUG
to main,authz,data
, then you'll see the context, database operations and LLM operations. Examples of each type of logging follow.
main:
main I'll send the following additional context: +0ms
main (Similarity: 0.548) Alice says that Bob is horrible to work with +0ms
main (Similarity: 0.412) Bob should work on being more collaborative. +0ms
main (Similarity: 0.355) Bob should contribute more to design and architectur
authz:
authz Authorization filter query from Oso: +0ms
authz id IN (WITH RECURSIVE
authz c0(arg0, arg2) AS NOT MATERIALIZED (
authz SELECT id, document_id FROM "block"
authz ),
authz c1(arg0, arg2) AS NOT MATERIALIZED (
authz SELECT id, folder_id FROM "document"
authz ),
authz c2(arg0) AS NOT MATERIALIZED (
authz SELECT id FROM "folder" WHERE is_public=true
authz )
authz SELECT f0.arg0
authz FROM c0 AS f0, c1 AS f1
authz WHERE f0.arg2 = f1.arg0 and f1.arg2 = '1'
authz UNION SELECT f0.arg0
authz FROM c0 AS f0, c1 AS f1
authz WHERE f0.arg2 = f1.arg0
authz UNION SELECT f0.arg0
authz FROM c0 AS f0, c1 AS f1, c2 AS f2
authz WHERE f0.arg2 = f1.arg0 and f1.arg2 = f2.arg0
authz ) +0ms
data:
data Authorized similarity search: +0ms
data SELECT
data id,
data document_id,
data content,
data 1 - (embedding::vector <=> promptEmbedding::vector) as similarity
data FROM block
data WHERE id IN ([2,3,5,4,1])
data AND (1 - (embedding::vector <=> promptEmbedding::vector)) > 0.3 +0ms
llm:
llm Model: text-embedding-3-small +0ms
llm Prompt: Why did Bob get a bad review? +0ms
embedding:
embedding Embedding: +0ms
embedding [
embedding -0.02280535, 0.01509199, -0.038275734, 0.027753543, -0.0067892126,
embedding 0.004817212, -0.036267348, 0.01919608, 0.017915372, -0.041040897,
embedding -0.023154635, -0.058854394, -0.013090882, 0.06193974, -0.011453613,
embedding -0.006676423, 0.047851942, 0.0011242586, -0.028757736, 0.04531963,
embedding 0.035481457, 0.054371916, 0.015615917, 0.018060906, -0.015819665,
embedding 0.030504158, 0.021277232, -0.027375152, 0.033647716, 0.009634424,
embedding -0.0140950745, -0.026239978, 0.045930877, -0.009707191, -0.061241172,
embedding -0.04203054, 0.009139605, -0.028481219, 0.06589829, 0.015484935,
embedding -0.011701022, 0.013738514, 0.0015290282, -0.025963463, 0.005453928,
embedding 0.035335924, -0.00019772309, 0.017682515, 0.030853441, 0.019385276,
embedding -0.031930402, 0.038828764, -0.027477028, -0.024537219, -0.0075023347,
embedding -0.035685208, -0.028088275, -0.016358146, 0.028641308, -0.0072549246,
embedding 0.049656577, -0.014597171, -0.055507086, 0.016736537, -0.08132502,
embedding -0.04133197, -0.018090013, -0.03964376, 0.008266394, 0.001715495,
embedding -0.0009896387, 0.07270934, 0.004828127, -0.009685361, 0.013134543,
embedding -0.0015326665, 0.07439754, 0.03216326, 0.0047226143, 0.0103184385,
embedding -0.010121967, 0.037751805, -0.015193865, 0.0016481851, -0.00926331,
embedding 0.00856474, -0.03830484, -0.012421421, 0.017682515, -0.036325563,
embedding -0.040575188, 0.02558507, 0.046600338, 0.01737689, 0.019923756,
embedding -0.053382274, -0.061765097, 0.017726175, -0.02105893, 0.008528357,
embedding ... 1436 more items
embedding ] +0ms