Initial implementation of the `ai` extension #7183

elprans · 2024-04-10T21:54:00Z

This adds the ai extension to EdgeDB, containing the following
functionality:

The new object-level ext::ai::index (similar to fts::index) that
automatically generates and indexes embeddings from the given index
expression.
A basic RAG interface via the /ai/rag HTTP endpoint that takes
a query selecting from an ai::index-indexed type and uses it as
context for a text generation question-answer completion.

The support for the above is implemented here for OpenAI, Mistral and
Anthropic, mostly to demonstrate multi-provider support. Model and
provider metadata is implemented as annotated abstract types in ext::ai,
and the intent is that users can define models in their schemas by
extending from the appropriate base type in ext::ai.

To test this out:

Apply the following schema:

using extension ai;

module default {
  type Astronomy {
    content: str;
    deferred index ext::ai::index(embedding_model := 'text-embedding-3-small')
      on (.content);
  }
};

Configure the OpenAI provider:

configure current database
insert ext::ai::OpenAIProviderConfig {
  secret := 'sk-....',
};

Insert some data:

insert Astronomy {
  content := 'Skies on Mars are red'
};
insert Astronomy {
  content := 'Skies on Earth are blue'
};

Test RAG:

$ curl --json '{
    "query": "What color is the sky on Mars?",
    "model": "gpt-4-turbo-preview",
    "context": {"query":"Astronomy"}
  }' http://127.0.0.1:5656/branch/main/ai/rag

{"response": "The sky on Mars is red."}

msullivan

A first batch of comments on the schema and index commits.

Mostly they are requests for more documentation

edb/lib/ext/ai.edgeql

edb/pgsql/delta.py

edb/schema/indexes.py

edb/pgsql/delta_ext_ai.py

msullivan · 2024-04-11T19:26:45Z

edb/pgsql/compiler/relgen.py

+    index_id = target_index_metadata.get("id")
+    if index_id is None:
+        raise AssertionError(
+            "missing expected index metadata in FunctionCall.extras")
+    dimensions = target_index_metadata.get("dimensions")
+    if dimensions is None:
+        raise AssertionError(
+            "missing expected index metadata in FunctionCall.extras")
+    df = target_index_metadata.get("distance_function")
+    if index_id is None:
+        raise AssertionError(
+            "missing expected index metadata in FunctionCall.extras")


Maybe extract with a single match?

msullivan · 2024-04-11T19:37:06Z

edb/pgsql/compiler/relgen.py

+    _ctx: context.CompilerContextLevel,
+    newctx: context.CompilerContextLevel,
+    _inner_ctx: context.CompilerContextLevel,


This one doesn't need to be addressed yet since it's really an FTS thing, but having three different contexts is quite complex and needs to be explained and given likely better names. (I think it's because the FTS stuff is picky about where stuff goes?)

@aljazerzen ^

msullivan · 2024-04-11T19:52:03Z

This is going to need tests. Testing the integrations themselves might not be plausible, but there is a ton of other code that we want to test.

Possible approaches:

Create a dummy provider that we use for tests
Stand up a local HTTP server that mocks one of the existing providers

edb/server/protocol/ai_ext.py

msullivan · 2024-04-11T20:29:18Z

edb/server/protocol/ai_ext.py

+        body = json.loads(request.body)
+        if not isinstance(body, dict):
+            raise TypeError(
+                'the body of the request must be a JSON object')
+
+        context = body.get('context')
+        if context is None:
+            raise TypeError(
+                'missing required "context" object in request')
+        if not isinstance(context, dict):
+            raise TypeError(
+                '"context" value in request is not a valid JSON object')
+
+        ctx_query = context.get("query")
+        ctx_variables = context.get("variables")
+        ctx_globals = context.get("globals")
+        ctx_max_obj_count = context.get("max_object_count")
+
+        if not ctx_query:
+            raise TypeError(
+                'missing required "query" in request "context" object')
+
+        if ctx_variables is not None and not isinstance(ctx_variables, dict):
+            raise TypeError('"variables" must be a JSON object')
+
+        if ctx_globals is not None and not isinstance(ctx_globals, dict):
+            raise TypeError('"globals" must be a JSON object')
+
+        model = body.get('model')
+        if not model:
+            raise TypeError(
+                'missing required "model" in request')
+
+        query = body.get('query')
+        if not query:
+            raise TypeError(
+                'missing required "query" in request')
+
+        stream = body.get('stream')
+        if stream is None:
+            stream = False
+        elif not isinstance(stream, bool):
+            raise TypeError('"stream" must be a boolean')
+
+        if ctx_max_obj_count is None:
+            ctx_max_obj_count = 5
+        elif not isinstance(ctx_max_obj_count, int) or ctx_max_obj_count <= 0:
+            raise TypeError(
+                '"context.max_object_count" must be an postitive integer')


Not a comment for now, but I think we're going to want a lightweight HTTP API framework at some point...

msullivan · 2024-04-11T20:32:49Z

edb/server/protocol/ai_ext.py

+        if not ctx_query:
+            raise TypeError(
+                'missing required "query" in request "context" object')


Should we make sure that it parses as a standalone fragment?

why? If it parses while wrapped then that's good enough, no?

I worry that some sort of injection might be possible.

Another thing we need to handle is comments in the query.

For that it probably suffices to append a newline after?

The API here isn't intended to be used by untrusted parties, it is similar in that regard to edgeql+http or graphql. We also explicitly disable all capabilities, including mutation.

Maybe my answer is just "it squicks me out", then

I think there are queries with unbalanced parens that would work

edb/server/protocol/ai_ext.py

msullivan · 2024-04-11T22:13:12Z

Really impressive work.

I might still have some more questions about the integrations.

edb/lib/ext/ai.edgeql

elprans · 2024-04-12T16:34:41Z

Pushed a bunch of fixup commits. I realized that I initially misunderstood how embedding shortening works, so I reworked the model metadata around that and also removed creation of discrete subvector columns for shortening-capable embeddings as it should be possible to create expression indexes on subvectors directly (with the yet-unreleased subvector() in pgvector).

elprans · 2024-04-12T16:35:52Z

Oh, and I fixed handling of text-embedding-3-large by truncating its output to 2000 (pgvector maximum).

edb/server/protocol/ai_ext.py

msullivan · 2024-04-12T19:10:40Z

Looking good, I think!

edb/edgeql/compiler/func.py

edb/lib/ext/ai.edgeql

edb/server/protocol/ai_ext.py

Implementation merged in #7174 is buggy, fix it.

Add abstract schema definitions for the new `ai` extension: 1. Provider config objects. 2. Abstract model types intended to house model metadata via annotations. 3. The `ext::ai::index` abstract object-level index similar to `fts::index`. 4. The `ext::ai::search` function similar to `fts::search`. 5. The `ext::ai_to_str_context` used to "stringify" objects returned by `ai::search` (or other search) for the purposes of generating text context for submission to an LLM. 6. The `ext::ai::ChatPrompt` type used to structurally define LLM chat prompts.

The `ext::ai::index` is (currently) an always-deferred index. Under the hood it adds several new columns to the relation of the object type it is declared on: one for each embedding vector variant (if the text extraction model supports outputs of varying dimensionality, a.k.a Matryoshka Representation), and one to denote if the embeddings are up-to-date with respect to the object content (maintained by a trigger). Declaration of `ext::ai::index` indexes also results in population of several internal views that expose objects-to-be-indexed for the deferred indexing process to consume. The text extraction model used to convert the index expression to an embedding is passed as a keyword argument when declaring a concrete index. Model metadata (name, dimensionality, limits etc) is then persisted on the index as internal annotations for ease of access.

The function is compiled into a vector distance search against the generated embeddings columns. The distance function used is determined by the arguments of `ext::ai::index` defined on the object type being searched.

The `ext::ai::to_context` takes an object and returns the result of evaluation of an `ext::ai::index` expression defined on it. It is an error to pass an object that does not have an `ext::ai::index`.

elprans · 2024-04-15T22:02:26Z

@scotttrinh, added you as a reviewer as I'm refactoring the auth ext tests a bit in the last few commits.

msullivan · 2024-04-15T22:28:53Z

That test looks good. Do we need some more for various schema manipulations? RAG?

scotttrinh

Test changes look good!

This implements the auto-vectorization of content indexed with `ext::ai::index` via communication with the corresponding model API. The code here is generic, there are not concrete model API implementations yet.

This is a convenience proxy to the upstream model `/embeddings` API.

This `POST` request expects a JSON object in the request body, containing the following: * `model`: the name of the text generation model to use. Must match an exactly one `ext::ai::model_name` annotation on an `ext::ai::TextGenerationModel` subtype. * `query`: user text input, used to rank selected context objects as well as the user prompt to the AI assistant. * `context`: - `query`: arbitrary non-DML EdgeQL query returning a set of objects, which type must be indexed with `ext::ai::index`. - `variables`: values for any EdgeQL query variables in `query`. - `globals`: values for any EdgeQL globals that the `query` might depend on. - `max_object_count`: maximum count of objects to include in the prompt context after running the similarity search against `query`. * `prompt`: - `id`: ID of an existing `ext::ai::ChatPrompt` object containing prompt configuration for this request; - `name`: name (`.name`) of an existing `ext::ai::ChatPrompt` object containing prompt configuration for this request (mutually exclusive with `prompt.id`; - `custom`: a list of `{"role": ..., "content": ...}` prompt messages to add to the pre-defined prompt, if `prompt.id` or `prompt.name` are specified), or to use as the whole prompt if neither `prompt.id` or `prompt.name` are specified. The `role` might be either `system` (to configure the general parameters of the chat), `user` (user prompt) or `assistant` (constrains or prefixes the LLM response). * `stream`: if `true`, the response will be streamed as server-sent events (`text/event-stream`), otherwise the entire response is sent all at once a JSON object.

edb/server/protocol/ai_ext.py

Wire in OpenAI models: `gpt-{3.5,4}-turbo`, as well as text embedding models: `text-embedding-3-{small,large}`.

Wire in Mistral models: `mistral-{small,medium,large}-latest`, as well as the `mistral-embed` text embedding model.

Wire in Antrhopic models: `claude-3-{haiku,sonnet,opus}`.

All tests in the test case use it, so it's an appropriate thing to do, and allows us to avoid setting the ContextVar in the mock server guts.

There's nothing specific to the auth extension in the mock HTTP server implementation and it will be useful in tests of other HTTP extensions, so move it to `testbase.http` and rename to `MockHttpServer`.

The qlast refactor PR (#7167) removed qlast while #7183 added more references. Fix them.

elprans requested review from msullivan, fantix and 1st1 April 10, 2024 21:54

elprans force-pushed the ext-ai branch 4 times, most recently from 832614e to 87bfcf5 Compare April 11, 2024 06:03

msullivan reviewed Apr 11, 2024

View reviewed changes

edb/server/protocol/ai_ext.py Outdated Show resolved Hide resolved

msullivan reviewed Apr 11, 2024

View reviewed changes

edb/server/protocol/ai_ext.py Outdated Show resolved Hide resolved

msullivan reviewed Apr 11, 2024

View reviewed changes

edb/server/protocol/ai_ext.py Show resolved Hide resolved

jaclarke reviewed Apr 12, 2024

View reviewed changes

edb/lib/ext/ai.edgeql Show resolved Hide resolved

elprans requested a review from msullivan April 12, 2024 16:31

msullivan reviewed Apr 12, 2024

View reviewed changes

edb/server/protocol/ai_ext.py Show resolved Hide resolved

1st1 approved these changes Apr 15, 2024

View reviewed changes

elprans added 2 commits April 15, 2024 14:11

Fix static evaluation of tuples

cbd2a0e

Implementation merged in #7174 is buggy, fix it.

elprans force-pushed the ext-ai branch from 8e18785 to bb9a3e8 Compare April 15, 2024 21:38

elprans added 3 commits April 15, 2024 14:54

Implement ext::ai::search

52001f8

The function is compiled into a vector distance search against the generated embeddings columns. The distance function used is determined by the arguments of `ext::ai::index` defined on the object type being searched.

Implement ext::ai::to_context

67434e3

The `ext::ai::to_context` takes an object and returns the result of evaluation of an `ext::ai::index` expression defined on it. It is an error to pass an object that does not have an `ext::ai::index`.

elprans force-pushed the ext-ai branch from bb9a3e8 to f0f7202 Compare April 15, 2024 21:55

elprans requested a review from scotttrinh April 15, 2024 22:01

scotttrinh approved these changes Apr 16, 2024

View reviewed changes

elprans added 3 commits April 15, 2024 22:28

Implement ext::ai indexer

95c6f8e

This implements the auto-vectorization of content indexed with `ext::ai::index` via communication with the corresponding model API. The code here is generic, there are not concrete model API implementations yet.

Implement the /ai/embeddings HTTP API

4fb87a5

This is a convenience proxy to the upstream model `/embeddings` API.

elprans force-pushed the ext-ai branch from f0f7202 to 6c888ab Compare April 16, 2024 05:41

jaclarke reviewed Apr 16, 2024

View reviewed changes

edb/server/protocol/ai_ext.py Outdated Show resolved Hide resolved

elprans added 6 commits April 16, 2024 09:22

ext::ai: Add OpenAI support

d8e9beb

Wire in OpenAI models: `gpt-{3.5,4}-turbo`, as well as text embedding models: `text-embedding-3-{small,large}`.

ext::ai: Add Mistral support

f231f64

Wire in Mistral models: `mistral-{small,medium,large}-latest`, as well as the `mistral-embed` text embedding model.

ext::ai: Add Anthropic support

bd9c870

Wire in Antrhopic models: `claude-3-{haiku,sonnet,opus}`.

tests/ext_auth: Move mock server startup to setUp()

e484af1

All tests in the test case use it, so it's an appropriate thing to do, and allows us to avoid setting the ContextVar in the mock server guts.

Move MockAuthProvider to testbase.http

ec967f1

There's nothing specific to the auth extension in the mock HTTP server implementation and it will be useful in tests of other HTTP extensions, so move it to `testbase.http` and rename to `MockHttpServer`.

Add some tests for ext::ai

685d364

elprans force-pushed the ext-ai branch from 6c888ab to 685d364 Compare April 16, 2024 16:23

elprans merged commit c983c6c into master Apr 16, 2024
23 checks passed

elprans deleted the ext-ai branch April 16, 2024 16:37

elprans added the to-backport-5.x label Apr 16, 2024

msullivan mentioned this pull request Apr 16, 2024

[5.x] Backport AI extension #7208

Merged

msullivan removed the to-backport-5.x label Apr 16, 2024

fantix approved these changes Apr 16, 2024

View reviewed changes

msullivan added a commit that referenced this pull request Apr 17, 2024

Fix qlast references after bad merge

043e9bd

The qlast refactor PR (#7167) removed qlast while #7183 added more references. Fix them.

msullivan mentioned this pull request Apr 17, 2024

Fix qlast references after bad merge #7217

Merged

msullivan added a commit that referenced this pull request Apr 17, 2024

Fix qlast references after bad merge (#7217)

79f7f61

The qlast refactor PR (#7167) removed qlast while #7183 added more references. Fix them.

fantix mentioned this pull request Feb 8, 2025

Use sys_pgcon for long-term advisory locks #8320

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial implementation of the `ai` extension #7183

Initial implementation of the `ai` extension #7183

elprans commented Apr 10, 2024 •

edited

Loading

msullivan left a comment

msullivan Apr 11, 2024

msullivan Apr 11, 2024

msullivan Apr 11, 2024

msullivan commented Apr 11, 2024

msullivan Apr 11, 2024

msullivan Apr 11, 2024

elprans Apr 11, 2024

msullivan Apr 11, 2024

msullivan Apr 11, 2024

elprans Apr 11, 2024

msullivan Apr 12, 2024

msullivan Apr 12, 2024

msullivan commented Apr 11, 2024

elprans commented Apr 12, 2024

elprans commented Apr 12, 2024

msullivan commented Apr 12, 2024

elprans commented Apr 15, 2024

msullivan commented Apr 15, 2024

scotttrinh left a comment

Initial implementation of the ai extension #7183

Initial implementation of the ai extension #7183

Conversation

elprans commented Apr 10, 2024 • edited Loading

msullivan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msullivan commented Apr 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msullivan commented Apr 11, 2024

elprans commented Apr 12, 2024

elprans commented Apr 12, 2024

msullivan commented Apr 12, 2024

elprans commented Apr 15, 2024

msullivan commented Apr 15, 2024

scotttrinh left a comment

Choose a reason for hiding this comment

Initial implementation of the `ai` extension #7183

Initial implementation of the `ai` extension #7183

elprans commented Apr 10, 2024 •

edited

Loading