diff --git a/docs/ai/guide_edgeql.rst b/docs/ai/guide_edgeql.rst
new file mode 100644
index 00000000000..fd6f2d6b8b8
--- /dev/null
+++ b/docs/ai/guide_edgeql.rst
@@ -0,0 +1,260 @@
+.. _ref_ai_quickstart_edgeql:
+
+================
+Gel AI in EdgeQL
+================
+
+:edb-alt-title: Gel AI Quickstart in EdgeQL
+
+
+Gel AI brings vector search capabilities and retrieval-augmented generation
+directly into the database.
+
+
+Enable and configure the extension
+==================================
+
+AI is a Gel extension. To enable it, we will need to add the extension
+to the app’s schema:
+
+.. code-block:: sdl
+
+    using extension ai;
+
+
+Gel AI uses external APIs in order to get vectors and LLM completions. For it
+to work, we need to configure an API provider and specify their API key. Let's
+open EdgeQL REPL and run the following query:
+
+.. code-block:: edgeql
+
+    configure current database
+    insert ext::ai::OpenAIProviderConfig {
+      secret := 'sk-....',
+    };
+
+
+Now our Gel application can take advantage of OpenAI's API to implement AI
+capabilities.
+
+
+.. note::
+
+   Gel AI comes with its own Admin panel that can be used to configure
+   providers, set up prompts and test them in a sandbox. Learn more.
+
+
+.. note::
+
+   Most API providers charge money, make sure you have that.
+
+
+Add vectors and perform similarity search
+=========================================
+
+Before we start introducing AI capabilities, let's set up our database with a
+schema and populate it with some data (we're going to be helping Komi-san keep
+track of her friends).
+
+.. code-block:: sdl
+
+    module default {
+        type Friend {
+            required name: str {
+                constraint exclusive;
+            };
+
+            summary: str;               # A brief description of personality and role
+            relationship_to_komi: str;  # Relationship with Komi
+            defining_trait: str;        # Primary character trait or quirk
+        }
+    }
+
+.. code-block:: bash
+    :class: collapsible
+
+    $ cat << 'EOF' > populate_db.edgeql
+    insert Friend {
+        name := 'Tadano Hitohito',
+        summary := 'An extremely average high school boy with a remarkable ability to read the atmosphere and understand others\' feelings, especially Komi\'s.',
+        relationship_to_komi := 'First friend and love interest',
+        defining_trait := 'Perceptiveness',
+    };
+
+    insert Friend {
+        name := 'Osana Najimi',
+        summary := 'An extremely outgoing person who claims to have been everyone\'s childhood friend. Gender: Najimi.',
+        relationship_to_komi := 'Second friend and social catalyst',
+        defining_trait := 'Universal childhood friend',
+    };
+
+    insert Friend {
+        name := 'Yamai Ren',
+        summary := 'An intense and sometimes obsessive classmate who is completely infatuated with Komi.',
+        relationship_to_komi := 'Self-proclaimed guardian and admirer',
+        defining_trait := 'Obsessive devotion',
+    };
+
+    insert Friend {
+        name := 'Katai Makoto',
+        summary := 'A intimidating-looking but shy student who shares many communication problems with Komi.',
+        relationship_to_komi := 'Fellow communication-challenged friend',
+        defining_trait := 'Scary appearance but gentle nature',
+    };
+
+    insert Friend {
+        name := 'Nakanaka Omoharu',
+        summary := 'A self-proclaimed wielder of dark powers who acts like an anime character and is actually just a regular gaming enthusiast.',
+        relationship_to_komi := 'Gaming buddy and chuunibyou friend',
+        defining_trait := 'Chuunibyou tendencies',
+    };
+    EOF
+    $ gel query -f populate_db.edgeql
+
+
+In order to get Gel to produce embedding vectors, we need to create a special
+``deferred index`` on the type we would like to perform similarity search on.
+More specifically, we need to specify an EdgeQL expression that produces a
+string that we're going to create an embedding vector for. This is how we would
+set up an index if we wanted to perform similarity search on
+``Friend.summary``:
+
+.. code-block:: sdl-diff
+
+      module default {
+          type Friend {
+              required name: str {
+                  constraint exclusive;
+              };
+
+              summary: str;               # A brief description of personality and role
+              relationship_to_komi: str;  # Relationship with Komi
+              defining_trait: str;        # Primary character trait or quirk
+
+    +         deferred index ext::ai::index(embedding_model := 'text-embedding-3-small')
+    +             on (.summary);
+          }
+      }
+
+
+But actually, in our case it would be better if we could similarity search
+across all properties at the same time. We can define the index on a more
+complex expression - like a concatenation of string properties - like this:
+
+
+.. code-block:: sdl-diff
+
+      module default {
+          type Friend {
+              required name: str {
+                  constraint exclusive;
+              };
+
+              summary: str;               # A brief description of personality and role
+              relationship_to_komi: str;  # Relationship with Komi
+              defining_trait: str;        # Primary character trait or quirk
+
+              deferred index ext::ai::index(embedding_model := 'text-embedding-3-small')
+    -             on (.summary);
+    +             on (
+    +                 .name ++ ' ' ++ .summary ++ ' '
+    +                 ++ .relationship_to_komi ++ ' '
+    +                 ++ .defining_trait
+    +             );
+          }
+      }
+
+
+Once we're done with schema modification, we need to apply them by going
+through a migration:
+
+.. code-block:: bash
+
+    $ gel migration create
+    $ gel migrate
+
+
+That's it! Gel will make necessary API requests in the background and create an
+index that will enable us to perform efficient similarity search like this:
+
+.. code-block:: edgeql
+
+    select ext::ai::search(Friend, query_vector);
+
+
+Note that this function accepts an embedding vector as the second argument, not
+a text string. This means that in order to similarity search for a string, we
+need to create a vector embedding for it using the same model as we used to
+create the index. Gel offers an HTTP endpoint ``/ai/embeddings`` that can
+handle it for us. All we need to do is to pass the vector it produces into the
+search query:
+
+
+.. code-block:: bash
+
+    $ curl --user user:password \
+      --json '{"input": "Who helps Komi make friends?", "model": "text-embedding-3-small"}' \
+      http://localhost:<port>/branch/main/ai/embeddings \
+      | jq -r '.data[0].embedding' \                                                    # extract the embedding out of the JSON
+      | tr -d '\n' \                                                                    # remove newlines
+      | sed 's/^\[//;s/\]$//' \                                                         # remove square brackets
+      | awk '{print "select ext::ai::search(Friend, <array<float32>>[" $0 "]);"}' \     # assemble the query
+      | gel query --file -  # pass the query into Gel CLI
+
+.. note::
+
+    Note that we're passing our login and password in order to autheticate the
+    request. We can find those using the CLI: ``gel instance credentials
+    --json``. Learn about all the other ways you can authenticate a request
+    :ref:`here <ref_http_auth>`.
+
+
+Use the built-in RAG
+====================
+
+One more feature Gel AI offers is built-in retrieval-augmented generation, also
+known as RAG.
+
+Gel comes preconfigured to be able to process our text query, perform
+similarity search across the index we just created, pass the results to an LLM
+and return a response. We can access the built-in RAG using the ``/ai/rag``
+HTTP endpoint:
+
+
+.. code-block:: bash
+
+    $ curl --user user:password --json '{
+        "query": "Who helps Komi make friends?",
+        "model": "gpt-4-turbo-preview",
+        "context": {"query":"select Friend"}
+      }' http://localhost:<port>/branch/main/ai/rag
+
+
+We can also stream the response like this:
+
+
+.. code-block:: bash-diff
+
+      $ curl --user user:password --json '{
+          "query": "Who helps Komi make friends?",
+          "model": "gpt-4-turbo-preview",
+          "context": {"query":"select Friend"},
+    +     "stream": true,
+        }' http://localhost:<port>/branch/main/ai/rag
+
+
+Keep going!
+===========
+
+You are now sufficiently equipped to use Gel AI in your applications.
+
+If you'd like to build something on your own, make sure to check out the
+Reference manual in order to learn the details about using different APIs and
+models, configuring prompts or using the UI. Make sure to also check out the
+Gel AI bindings in Python and JavaScript if those languages are relevant to
+you.
+
+And if you would like more guidance for how Gel AI can be fit into an
+application, take a look at the FastAPI Gel AI Tutorial, where we're building a
+search bot using features you learned about above.
+
diff --git a/docs/ai/guide_python.rst b/docs/ai/guide_python.rst
new file mode 100644
index 00000000000..5471f331d67
--- /dev/null
+++ b/docs/ai/guide_python.rst
@@ -0,0 +1,334 @@
+.. _ref_ai_quickstart_python:
+
+================
+Gel AI in Python
+================
+
+:edb-alt-title: Gel AI Quickstart in Python
+
+Gel AI brings vector search capabilities and retrieval-augmented generation
+directly into the database. It's integrated into the Gel Python binding via the
+``gel.ai`` module.
+
+.. code-block:: bash
+
+  $ pip install 'gel[ai]'
+
+
+Enable and configure the extension
+==================================
+
+AI is an Gel extension. To enable it, we will need to add the extension
+to the app’s schema:
+
+.. code-block:: sdl
+
+    using extension ai;
+
+
+Gel AI uses external APIs in order to get vectors and LLM completions. For it
+to work, we need to configure an API provider and specify their API key. Let's
+open EdgeQL REPL and run the following query:
+
+.. code-block:: edgeql
+
+    configure current database
+    insert ext::ai::OpenAIProviderConfig {
+      secret := 'sk-....',
+    };
+
+
+Now our Gel application can take advantage of OpenAI's API to implement AI
+capabilities.
+
+
+.. note::
+
+   Gel AI comes with its own Admin panel that can be used to configure
+   providers, set up prompts and test them in a sandbox. Learn more.
+
+
+.. note::
+
+   Most API providers charge money, make sure you have that.
+
+
+Add vectors
+===========
+
+Before we start introducing AI capabilities, let's set up our database with a
+schema and populate it with some data (we're going to be helping Komi-san keep
+track of her friends).
+
+.. code-block:: sdl
+
+    module default {
+        type Friend {
+            required name: str {
+                constraint exclusive;
+            };
+
+            summary: str;               # A brief description of personality and role
+            relationship_to_komi: str;  # Relationship with Komi
+            defining_trait: str;        # Primary character trait or quirk
+        }
+    }
+
+.. code-block:: bash
+    :class: collapsible
+
+    $ cat << 'EOF' > populate_db.edgeql
+    insert Friend {
+        name := 'Tadano Hitohito',
+        summary := 'An extremely average high school boy with a remarkable ability to read the atmosphere and understand others\' feelings, especially Komi\'s.',
+        relationship_to_komi := 'First friend and love interest',
+        defining_trait := 'Perceptiveness',
+    };
+
+    insert Friend {
+        name := 'Osana Najimi',
+        summary := 'An extremely outgoing person who claims to have been everyone\'s childhood friend. Gender: Najimi.',
+        relationship_to_komi := 'Second friend and social catalyst',
+        defining_trait := 'Universal childhood friend',
+    };
+
+    insert Friend {
+        name := 'Yamai Ren',
+        summary := 'An intense and sometimes obsessive classmate who is completely infatuated with Komi.',
+        relationship_to_komi := 'Self-proclaimed guardian and admirer',
+        defining_trait := 'Obsessive devotion',
+    };
+
+    insert Friend {
+        name := 'Katai Makoto',
+        summary := 'A intimidating-looking but shy student who shares many communication problems with Komi.',
+        relationship_to_komi := 'Fellow communication-challenged friend',
+        defining_trait := 'Scary appearance but gentle nature',
+    };
+
+    insert Friend {
+        name := 'Nakanaka Omoharu',
+        summary := 'A self-proclaimed wielder of dark powers who acts like an anime character and is actually just a regular gaming enthusiast.',
+        relationship_to_komi := 'Gaming buddy and chuunibyou friend',
+        defining_trait := 'Chuunibyou tendencies',
+    };
+    EOF
+    $ gel query -f populate_db.edgeql
+
+
+In order to get Gel to produce embedding vectors, we need to create a special
+``deferred index`` on the type we would like to perform similarity search on.
+More specifically, we need to specify an EdgeQL expression that produces a
+string that we're going to create an embedding vector for. This is how we would
+set up an index if we wanted to perform similarity search on
+``Friend.summary``:
+
+.. code-block:: sdl-diff
+
+      module default {
+          type Friend {
+              required name: str {
+                  constraint exclusive;
+              };
+
+              summary: str;               # A brief description of personality and role
+              relationship_to_komi: str;  # Relationship with Komi
+              defining_trait: str;        # Primary character trait or quirk
+
+    +         deferred index ext::ai::index(embedding_model := 'text-embedding-3-small')
+    +             on (.summary);
+          }
+      }
+
+
+But actually, in our case it would be better if we could similarity search
+across all properties at the same time. We can define the index on a more
+complex expression - like a concatenation of string properties - like this:
+
+
+.. code-block:: sdl-diff
+
+      module default {
+          type Friend {
+              required name: str {
+                  constraint exclusive;
+              };
+
+              summary: str;               # A brief description of personality and role
+              relationship_to_komi: str;  # Relationship with Komi
+              defining_trait: str;        # Primary character trait or quirk
+
+              deferred index ext::ai::index(embedding_model := 'text-embedding-3-small')
+    -             on (.summary);
+    +             on (
+    +                 .name ++ ' ' ++ .summary ++ ' '
+    +                 ++ .relationship_to_komi ++ ' '
+    +                 ++ .defining_trait
+    +             );
+          }
+      }
+
+
+Once we're done with schema modification, we need to apply them by going
+through a migration:
+
+.. code-block:: bash
+
+    $ gel migration create
+    $ gel migrate
+
+
+That's it! Gel will make necessary API requests in the background and create an
+index that will enable us to perform efficient similarity search.
+
+
+Perform similarity search in Python
+===================================
+
+In order to run queries against the index we just created, we need to create a
+Gel client and pass it to a Gel AI instance.
+
+.. code-block:: python
+
+    import gel
+    import gel.ai
+
+    gel_client = gel.create_client()
+    gel_ai = edgedb.ai.create_ai(client)
+
+    text = "Who helps Komi make friends?"
+    vector = gel_ai.generate_embeddings(
+        text,
+        "text-embedding-3-small",
+    )
+
+    gel_client.query(
+        "select ext::ai::search(Friend, <array<float32>>$embedding_vector",
+        embedding_vector=vector,
+    )
+
+
+We are going to execute a query that calls a single function:
+``ext::ai::search(<type>, <search_vector>)``. That function accepts an
+embedding vector as the second argument, not a text string. This means that in
+order to similarity search for a string, we need to create a vector embedding
+for it using the same model as we used to create the index. The Gel AI binding
+in Python comes with a ``generate_embeddings`` function that does exactly that:
+
+
+.. code-block:: python-diff
+
+      import gel
+      import gel.ai
+
+      gel_client = gel.create_client()
+      gel_ai = edgedb.ai.create_ai(client)
+
+    + text = "Who helps Komi make friends?"
+    + vector = gel_ai.generate_embeddings(
+    +     text,
+    +     "text-embedding-3-small",
+    + )
+
+
+Now we can plug that vector directly into our query to get similarity search
+results:
+
+
+.. code-block:: python-diff
+
+      import gel
+      import gel.ai
+
+      gel_client = gel.create_client()
+      gel_ai = edgedb.ai.create_ai(client)
+
+      text = "Who helps Komi make friends?"
+      vector = gel_ai.generate_embeddings(
+          text,
+          "text-embedding-3-small",
+      )
+
+    + gel_client.query(
+    +     "select ext::ai::search(Friend, <array<float32>>$embedding_vector",
+    +     embedding_vector=vector,
+    + )
+
+
+Use the built-in RAG
+====================
+
+One more feature Gel AI offers is built-in retrieval-augmented generation, also
+known as RAG.
+
+Gel comes preconfigured to be able to process our text query, perform
+similarity search across the index we just created, pass the results to an LLM
+and return a response. In order to access the built-in RAG, we need to start by
+selecting an LLM and passing its name to the Gel AI instance constructor:
+
+
+.. code-block:: python-diff
+
+      import gel
+      import gel.ai
+
+      gel_client = gel.create_client()
+      gel_ai = edgedb.ai.create_ai(
+          client,
+    +     model="gpt-4-turbo-preview"
+      )
+
+
+Now we can access the RAG using the ``query_rag`` function like this:
+
+
+.. code-block:: python-diff
+
+      import gel
+      import gel.ai
+
+      gel_client = gel.create_client()
+      gel_ai = edgedb.ai.create_ai(
+          client,
+          model="gpt-4-turbo-preview"
+      )
+
+    + gel_ai.query_rag(
+    +     "Who helps Komi make friends?",
+    +     context="Friend",
+    + )
+
+We can also stream the response like this:
+
+
+.. code-block:: python-diff
+
+      import gel
+      import gel.ai
+
+      gel_client = gel.create_client()
+      gel_ai = edgedb.ai.create_ai(
+          client,
+          model="gpt-4-turbo-preview"
+      )
+
+    - gel_ai.query_rag(
+    + gel_ai.stream_rag(
+          "Who helps Komi make friends?",
+          context="Friend",
+      )
+
+Keep going!
+===========
+
+You are now sufficiently equipped to use Gel AI in your applications.
+
+If you'd like to build something on your own, make sure to check out the
+Reference manual in order to learn the details about using different APIs and
+models, configuring prompts or using the UI.
+
+And if you would like more guidance for how Gel AI can be fit into an
+application, take a look at the FastAPI Gel AI Tutorial, where we're building a
+search bot using features you learned about above.
+
+
diff --git a/docs/ai/index.rst b/docs/ai/index.rst
index 88134092fdf..7ff8c00a333 100644
--- a/docs/ai/index.rst
+++ b/docs/ai/index.rst
@@ -1,267 +1,27 @@
 .. _ref_ai_overview:
 
-==
-AI
-==
-
-.. toctree::
-    :hidden:
-    :maxdepth: 3
-
-    javascript
-    python
-    reference
+======
+Gel AI
+======
 
 :edb-alt-title: Using Gel AI
 
-|Gel| AI allows you to ship AI-enabled apps with practically no effort. It
-automatically generates embeddings for your data. Works with OpenAI, Mistral
-AI, Anthropic, and any other provider with a compatible API.
-
-
-Enable extension in your schema
-===============================
-
-AI is a |Gel| extension. To enable it, you will need to add the extension
-to your app's schema:
-
-.. code-block:: sdl
-
-    using extension ai;
-
-
-Extension configuration
-=======================
-
-The AI extension may be configured via our UI or via EdgeQL. To use the
-built-in UI, access it by running :gelcmd:`ui`. If you have the extension
-enabled in your schema as shown above and have migrated that schema change, you
-will see the "AI Admin" icon in the left-hand toolbar.
-
-.. image:: images/ui-ai.png
-    :alt: The Gel local development server UI highlighting the AI admin
-          icon in the left-hand toolbar. The icon is two stars, one larger and
-          one smaller, the smaller being a light pink color and the larger
-          being a light blue when selected.
-    :width: 100%
-
-The default tab "Playground" allows you to test queries against your data after
-you first configure the model, prompt, and context query in the right sidebar.
-
-The "Prompts" tab allows you to configure prompts for use in the playground.
-The "Providers" tab must be configured for the API you want to use for
-embedding generation and querying. We currently support OpenAI, Mistral AI, and
-Anthropic.
-
-
-Configuring a provider
-----------------------
-
-To configure a provider, you will first need to obtain an API key for your
-chosen provider, which you may do from their respective sites:
-
-* `OpenAI API keys <https://platform.openai.com/account/api-keys>`__
-* `Mistral API keys <https://console.mistral.ai/api-keys/>`__
-* `Anthropic API keys <https://console.anthropic.com/settings/keys>`__
-
-With your API key, you may now configure in the UI by clickin the "Add
-Provider" button, selecting the appropriate API, and pasting your key in the
-"Secret" field.
-
-.. image:: images/ui-ai-add-provider.png
-    :alt: The "Add Provider" form of the Gel local development server UI.
-          On the left, the sidebar navigation for the view showing Playground,
-          Prompts, and Providers options, with Provider selected (indicated
-          with a purple border on the left). The main content area shows a
-          heading Providers with a form under it. The form contains a dropdown
-          to select the API. (Anthropic is currently selected.) The form
-          contains two fields: an optional Client ID and a Secret. The Secret
-          field is filled with your-api-key-here. Under the fields to the
-          right, the form has a gray button to cancel and a purple Add Provider
-          button.
-    :width: 100%
-
-You may alternatively configure a provider via EdgeQL:
-
-.. code-block:: edgeql
-
-    configure current branch
-    insert ext::ai::OpenAIProviderConfig {
-      secret := 'sk-....',
-    };
-
-This object has other properties as well, including ``client_id`` and
-``api_url``, which can be set as strings to override the defaults for the
-chosen provider.
-
-We have provider config types for each of the three supported APIs:
-
-* ``OpenAIProviderConfig``
-* ``MistralProviderConfig``
-* ``AnthropicProviderConfig``
-
-
-Usage
-=====
-
-Using |Gel| AI requires some changes to your schema.
-
-
-Add an index
-------------
-
-To start using |Gel| AI on a type, create an index:
-
-.. code-block:: sdl-diff
-
-      module default {
-        type Astronomy {
-          content: str;
-    +     deferred index ext::ai::index(embedding_model := 'text-embedding-3-small')
-    +       on (.content);
-        }
-      };
-
-In this example, we have added an AI index on the ``Astronomy`` type's
-``content`` property using the ``text-embedding-3-small`` model. Once you have
-the index in your schema, :ref:`create <ref_cli_gel_migration_create>` and
-:ref:`apply <ref_cli_gel_migration_apply>` your migration, and you're ready
-to start running queries!
-
-.. note::
-
-    The particular embedding model we've chosen here
-    (``text-embedding-3-small``) is an OpenAI model, so it will require an
-    OpenAI provider to be configured as described above.
-
-    You may use any of :ref:`our pre-configured embedding generation models
-    <ref_ai_reference_embedding_models>`.
-
-You may want to include multiple properties in your AI index. Fortunately, you
-can define an AI index on an expression:
-
-.. code-block:: sdl
-
-      module default {
-        type Astronomy {
-          climate: str;
-          atmosphere: str;
-          deferred index ext::ai::index(embedding_model := 'text-embedding-3-small')
-            on (.climate ++ ' ' ++ .atmosphere);
-        }
-      };
-
-.. note:: When AI indexes aren't working…
-
-    If you find your queries are not returning the expected results, try
-    inspecting your instance logs. On a |Gel| Cloud instance, use the "Logs"
-    tab in your instance dashboard. On local or :ref:`CLI-linked remote
-    instances <ref_cli_gel_instance_link>`, use :gelcmd:`instance logs -I
-    <instance-name>`. You may find the problem there.
-
-    Providers impose rate limits on their APIs which can often be the source of
-    AI index problems. If index creation hits a rate limit, |Gel| will wait
-    the ``indexer_naptime`` (see the docs on :ref:`ext::ai configuration
-    <ref_ai_reference_config>`) and resume index creation.
-
-    If your indexed property contains values that exceed the token limit for a
-    single request, you may consider truncating the property value in your
-    index expression. You can do this with a string by slicing it:
-
-    .. code-block:: sdl
-
-        module default {
-          type Astronomy {
-            content: str;
-            deferred index ext::ai::index(embedding_model := 'text-embedding-3-small')
-              on (.content[0:10000]);
-          }
-        };
-
-    This example will slice the first 10,000 characters of the ``content``
-    property for indexing.
-
-    Tokens are not equivalent to characters. For OpenAI embedding generation,
-    you may test values via `OpenAI's web-based tokenizer
-    <https://platform.openai.com/tokenizer>`__. You may alternatively download
-    the library OpenAI uses for tokenization from that same page if you prefer.
-    By testing, you can get an idea how much of your content can be sent for
-    indexing.
-
-
-Run a semantic similarity query
--------------------------------
-
-Once your index has been migrated, running a query against the embeddings is
-super simple:
-
-.. code-block:: edgeql
-
-    select ext::ai::search(Astronomy, query)
-
-Simple, but you'll still need to generate embeddings from your query or pass in
-existing embeddings. If your ultimate goal is retrieval-augmented generation
-(i.e., RAG), we've got you covered.
-
-.. _ref_ai_overview_rag:
-
-Use RAG via HTTP
-----------------
-
-By making an HTTP request to
-``https://<gel-host>:<port>/branch/<branch-name>/ai/rag``, you can generate
-text via the generative AI API of your choice within the context of a type with
-a deferred embedding index.
-
-.. note::
-
-    Making HTTP requests to |Gel| requires :ref:`authentication
-    <ref_http_auth>`.
-
-.. code-block:: bash
-
-    $ curl --json '{
-        "query": "What color is the sky on Mars?",
-        "model": "gpt-4-turbo-preview",
-        "context": {"query":"select Astronomy"}
-      }' https://<gel-host>:<port>/branch/<branch-name>/ai/rag
-    {"response": "The sky on Mars is red."}
-
-Since LLMs are often slow, it may be useful to stream the response. To do this,
-add ``"stream": true`` to your request JSON.
-
-.. note::
-
-    The particular text generation model we've chosen here
-    (``gpt-4-turbo-preview``) is an OpenAI model, so it will require an OpenAI
-    provider to be configured as described above.
-
-    You may use any of our supported :ref:`text generation models
-    <ref_ai_reference_text_generation_models>`.
-
-
-Use RAG via JavaScript
-----------------------
+|Gel| AI is a set of tools designed to enable you to ship AI-enabled apps with
+practically no effort. This is what comes in the box:
 
-``@gel/ai`` offers a convenient wrapper around ``ext::ai``. Install it with
-``npm install @gel/ai`` (or via your package manager of choice) and
-implement it like this example:
+1. ``ext::ai``: this Gel extension automatically generates embeddings for your
+   data. Works with OpenAI, Mistral AI, Anthropic, and any other provider with a
+   compatible API.
 
-.. code-block:: typescript
+2. ``ext::vectorstore``: this extension is designed to replicate workflows that
+   might be familiar to you from vectorstore-style databases. Powered by
+   ``pgvector``, it allows you to store and search for embedding vectors, and
+   integrates with popular AI frameworks.
 
-    import { createClient } from "gel";
-    import { createAI } from "@gel/ai";
+3. Python library: ``gel.ai``. Access all Gel AI features straight from your
+   Python application.
 
-    const client = createClient();
+4. JavaScript library: ``gel.ai``.
 
-    const gpt4AI = createAI(client, {
-      model: "gpt-4-turbo-preview",
-    });
 
-    const blogAI = gpt4AI.withContext({
-      query: "select Astronomy"
-    });
 
-    console.log(await blogAI.queryRag(
-      "What color is the sky on Mars?"
-    ));
diff --git a/docs/ai/quickstart_fastapi_ai.rst b/docs/ai/quickstart_fastapi_ai.rst
new file mode 100644
index 00000000000..821daf57d7e
--- /dev/null
+++ b/docs/ai/quickstart_fastapi_ai.rst
@@ -0,0 +1,346 @@
+.. _ref_quickstart_ai:
+
+======================
+Using the built-in RAG
+======================
+
+.. edb:split-section::
+
+    In this section we'll use |Gel|'s built-in vector search and
+    retrieval-augmented generation capabilities to decorate our flashcard app
+    with a couple AI features. We're going to create a ``/fetch_similar``
+    endpoint that's going to look up flashcards similar to a text search query,
+    as well as a ``/fetch_rag`` endpoint that's going to enable us to talk to
+    an LLM about the content of our flashcard deck.
+
+    We're going to start with the same schema we left off with in the primary
+    quickstart.
+
+
+    .. code-block:: sdl
+        :caption: dbschema/default.gel
+
+        module default {
+            abstract type Timestamped {
+                required created_at: datetime {
+                    default := datetime_of_statement();
+                };
+                required updated_at: datetime {
+                    default := datetime_of_statement();
+                };
+            }
+
+            type Deck extending Timestamped {
+                required name: str;
+                description: str;
+                cards := (
+                    select .<deck[is Card]
+                    order by .order
+                );
+            };
+
+            type Card extending Timestamped {
+                required order: int64;
+                required front: str;
+                required back: str;
+                required deck: Deck;
+            }
+        }
+
+
+.. edb:split-section::
+
+    AI-related features in |Gel| come packaged in the extension called ``ai``.
+    Let's enable it by adding the following line on top of the
+    ``dbschema/default.gel`` and running a migration.
+
+    This does a few things. First, it enables us to use features from the extension by prefixing them with ``ext::ai::``.
+
+
+    .. code-block:: sdl-diff
+        :caption: dbschema/default.gel
+
+        + using extension ai;
+
+          module default {
+              abstract type Timestamped {
+                  required created_at: datetime {
+                      default := datetime_of_statement();
+                  };
+                  required updated_at: datetime {
+                      default := datetime_of_statement();
+                  };
+              }
+
+              type Deck extending Timestamped {
+                  required name: str;
+                  description: str;
+                  cards := (
+                      select .<deck[is Card]
+                      order by .order
+                  );
+              };
+
+              type Card extending Timestamped {
+                  required order: int64;
+                  required front: str;
+                  required back: str;
+                  required deck: Deck;
+              }
+          }
+
+.. edb:split-section::
+
+    This enabled us to use features in the ``ext::ai::`` namespace. Here's a
+    notable one: ``ProviderConfig``, which we can use to configure our API
+    keys. |Gel| supports a variety of external APIs for creating embedding
+    vectors for text and fetching LLM completions.
+
+    Let's configure an API key for OpenAI by running the following query in the
+    REPL:
+
+    .. note::
+
+        Once the extension is active, we can also access the dedicated AI tab
+        in the UI. There we can manage provider configurations and try out
+        different RAG configuraton in the Playground.
+
+
+    .. code-block:: edgeql-repl
+
+        db> configure current database
+            insert ext::ai::OpenAIProviderConfig {
+                secret := 'sk-....',
+            };
+
+
+.. edb:split-section::
+
+    Once last thing before we move on. Let's add some sample data to give the
+    embedding model something to work with. You can copy and run this command
+    in the terminal, or come up with your own sample data.
+
+
+    .. code-block:: edgeql
+        :class: collapsible
+
+        $ cat << 'EOF' | edgedb query --file -
+        with deck := (
+            insert Deck {
+                name := 'Smelly Cheeses',
+                description := 'To impress everyone with stinky cheese trivia.'
+            }
+        )
+        for card_data in {(
+            1,
+            'Époisses de Bourgogne',
+            'Known as the "king of cheeses", this French cheese is so pungent it\'s banned on public transport in France. Washed in brandy, it becomes increasingly funky as it ages. Orange-red rind, creamy interior.'
+        ), (
+            2,
+            'Vieux-Boulogne',
+            'Officially the smelliest cheese in the world according to scientific studies. This northern French cheese has a reddish-orange rind from being washed in beer. Smooth, creamy texture with a powerful aroma.'
+        ), (
+            3,
+            'Durian Cheese',
+            'This Malaysian creation combines durian fruit with cheese, creating what some consider the ultimate "challenging" dairy product. Combines the pungency of blue cheese with durian\'s notorious aroma.'
+        ), (
+            4,
+            'Limburger',
+            'German cheese famous for its intense smell, often compared to foot odor due to the same bacteria. Despite its reputation, has a surprisingly mild taste with notes of mushroom and grass.'
+        ), (
+            5,
+            'Roquefort',
+            'The "king of blue cheeses", aged in limestone caves in southern France. Contains Penicillium roqueforti mold. Strong, tangy, and salty with a crumbly texture. Legend says it was discovered when a shepherd left his lunch in a cave.'
+        ), (
+            6,
+            'What makes washed-rind cheeses so smelly?',
+            'The process of washing cheese rinds in brine, alcohol, or other solutions promotes the growth of Brevibacterium linens, the same bacteria responsible for human body odor. This bacteria contributes to both the orange color and distinctive aroma.'
+        ), (
+            7,
+            'Stinking Bishop',
+            'Named after the Stinking Bishop pear (not a religious figure). This English cheese is washed in perry made from these pears. Known for its powerful aroma and sticky, pink-orange rind. Gained fame after being featured in Wallace & Gromit.'
+        )}
+        union (
+            insert Card {
+                deck := deck,
+                order := card_data.0,
+                front := card_data.1,
+                back := card_data.2
+            }
+        );
+        EOF
+
+
+.. edb:split-section::
+
+    Now we can finally start producing embedding vectors. Since |Gel| is fully
+    aware of when your data gets inserted, updated and deleted, it's perfectly
+    equipped to handle all the tedious work of keeping those vectors up to
+    date. All that's left for us is to create a special ``deferred index`` on
+    the data we would like to perform similarity search on.
+
+
+    .. code-block:: sdl-diff
+        :caption: dbschema/default.gel
+
+          using extension ai;
+
+          module default {
+              abstract type Timestamped {
+                  required created_at: datetime {
+                      default := datetime_of_statement();
+                  };
+                  required updated_at: datetime {
+                      default := datetime_of_statement();
+                  };
+              }
+
+              type Deck extending Timestamped {
+                  required name: str;
+                  description: str;
+                  cards := (
+                      select .<deck[is Card]
+                      order by .order
+                  );
+              };
+
+              type Card extending Timestamped {
+                  required order: int64;
+                  required front: str;
+                  required back: str;
+                  required deck: Deck;
+
+        +         deferred index ext::ai::index(embedding_model := 'text-embedding-3-small')
+        +             on (.front ++ ' ' ++ .back);
+              }
+          }
+
+
+.. edb:split-section::
+
+    It's time to start running queries.
+
+    Let's begin by creating the ``/fetch_similar`` endpoint we mentioned
+    earlier. It's job is going to be to find 3 flashcards that are the most
+    similar to the provided text query. We can use this endpoint to implement a
+    "recommended flashcards" on the frontend.
+
+    The AI extension contains a function called ``ext::ai::search(Type,
+    embedding_vector)`` that we can use to do our fetch. Note that the second
+    argument is an embedding vector, not a text query. To transform our text
+    query into a vector, we will use the ``generate_embeddings`` function from
+    the ``ai`` module of |Gel|'s Python binding.
+
+    Gathered together, here are the modifications we need to do to the
+    ``main.py`` function:
+
+
+    .. code-block:: python-diff
+        :caption: main.py
+
+          import edgedb
+        + import edgedb.ai
+
+          from fastapi import FastAPI
+
+
+          client = edgedb.create_async_client()
+
+          app = FastAPI()
+
+
+        + @app.get("/fetch_similar")
+        + async def fetch_similar_cards(query: str):
+        +     rag = await edgedb.ai.create_async_ai(client, model="gpt-4-turbo-preview")
+        +     embedding_vector = await rag.generate_embeddings(
+        +         query, model="text-embedding-3-small"
+        +     )
+
+        +     similar_cards = await client.query(
+        +         "select ext::ai::search(Card, <array<float32>>$embedding_vector)",
+        +         embedding_vector=embedding_vector,
+        +     )
+
+        +     return similar_cards
+
+
+.. edb:split-section::
+
+    Let's test the endpoint to see that everything works the way we expect.
+
+
+    .. code-block:: bash
+
+        $ curl -X 'GET' \
+          'http://localhost:8000/fetch_similar?query=the%20stinkiest%20cheese' \
+          -H 'accept: application/json'
+
+
+.. edb:split-section::
+
+    Finally, let's create the second endpoint we mentioned, called
+    ``/fetch_rag``. We'll be able to use this one to, for example, ask an LLM
+    to quiz us on the contents of our deck.
+
+    The RAG feature is represented in the Python binding with the ``query_rag``
+    method of the ``GelRAG`` class. To use it, we're going to instantiate the
+    class and call the method... And that's it!
+
+
+    .. code-block:: python-diff
+        :caption: main.py
+
+          import edgedb
+          import edgedb.ai
+
+          from fastapi import FastAPI
+
+
+          client = edgedb.create_async_client()
+
+          app = FastAPI()
+
+
+          @app.get("/fetch_similar")
+          async def fetch_similar_cards(query: str):
+              rag = await edgedb.ai.create_async_ai(client, model="gpt-4-turbo-preview")
+              embedding_vector = await rag.generate_embeddings(
+                  query, model="text-embedding-3-small"
+              )
+
+              similar_cards = await client.query(
+                  "select ext::ai::search(Card, <array<float32>>$embedding_vector)",
+                  embedding_vector=embedding_vector,
+              )
+
+              return similar_cards
+
+
+        + @app.get("/fetch_rag")
+        + async def fetch_rag_response(query: str):
+        +     rag = await edgedb.ai.create_async_ai(client, model="gpt-4-turbo-preview")
+        +     response = await rag.query_rag(
+        +         message=query,
+        +         context=edgedb.ai.QueryContext(query="select Card"),
+        +     )
+        +     return response
+
+
+.. edb:split-section::
+
+    Let's test the endpoint to see if it works:
+
+
+    .. code-block:: bash
+
+        $ curl -X 'GET' \
+          'http://localhost:8000/fetch_rag?query=what%20cheese%20smells%20like%20feet' \
+          -H 'accept: application/json'
+
+
+.. edb:split-section::
+
+    Congratulations! We've now implemented AI features in our flashcards app.
+    Of cource, there's more to learn when it comes to using the AI extension.
+    Make sure to check out the Reference manual, or build an LLM-powered search
+    bot from the ground up with the FastAPI Gel AI tutorial.
diff --git a/docs/ai/reference.rst b/docs/ai/reference.rst
deleted file mode 100644
index 85557ff33fb..00000000000
--- a/docs/ai/reference.rst
+++ /dev/null
@@ -1,671 +0,0 @@
-.. _ref_ai_reference:
-
-=======
-ext::ai
-=======
-
-To activate |Gel| AI functionality, you can use the :ref:`extension
-<ref_datamodel_extensions>` mechanism:
-
-.. code-block:: sdl
-
-    using extension ai;
-
-
-.. _ref_ai_reference_config:
-
-Configuration
-=============
-
-Use the ``configure`` command to set configuration for the AI extension. Update
-the values using the ``configure session`` or the ``configure current branch``
-command depending on the scope you prefer:
-
-.. code-block:: edgeql-repl
-
-    db> configure current branch
-    ... set ext::ai::Config::indexer_naptime := <duration>'0:00:30';
-    OK: CONFIGURE DATABASE
-
-The only property available currently is ``indexer_naptime`` which specifies
-the minimum delay between deferred ``ext::ai::index`` indexer runs on any given
-branch.
-
-Examine the ``extensions`` link of the ``cfg::Config`` object to check the
-current config values:
-
-.. code-block:: edgeql-repl
-
-    db> select cfg::Config.extensions[is ext::ai::Config]{*};
-    {
-      ext::ai::Config {
-        id: 1a53f942-d7ce-5610-8be2-c013fbe704db,
-        indexer_naptime: <duration>'0:00:30'
-      }
-    }
-
-You may also restore the default config value using ``configure session
-reset`` if you set it on the session or ``configure current branch reset``
-if you set it on the branch:
-
-.. code-block:: edgeql-repl
-
-    db> configure current branch reset ext::ai::Config::indexer_naptime;
-    OK: CONFIGURE DATABASE
-
-
-Providers
----------
-
-Provider configs are required for AI indexes (for embedding generation) and for
-RAG (for text generation). They may be added via :ref:`ref_cli_gel_ui` or by
-via EdgeQL:
-
-.. code-block:: edgeql
-
-    configure current branch
-    insert ext::ai::OpenAIProviderConfig {
-      secret := 'sk-....',
-    };
-
-The extension makes available types for each provider and for a custom provider
-compatible with one of the supported API styles.
-
-* ``ext::ai::OpenAIProviderConfig``
-* ``ext::ai::MistralProviderConfig``
-* ``ext::ai::AnthropicProviderConfig``
-* ``ext::ai::CustomProviderConfig``
-
-All provider types require the ``secret`` property be set with a string
-containing the secret provided by the AI vendor. Other properties may
-optionally be set:
-
-* ``name``- A unique provider name
-* ``display_name``- A human-friendly provider name
-* ``api_url``- The provider's API URL
-* ``client_id``- ID for the client provided by model API vendor
-
-In addition to the required ``secret`` property,
-``ext::ai::CustomProviderConfig requires an ``api_style`` property be set.
-Available values are ``ext::ai::ProviderAPIStyle.OpenAI`` and
-``ext::ai::ProviderAPIStyle.Anthropic``.
-
-Prompts
--------
-
-You may add prompts either via :ref:`ref_cli_gel_ui` or via EdgeQL. Here's
-an example of how you might add a prompt with a single message:
-
-.. code-block:: edgeql
-
-    insert ext::ai::ChatPrompt {
-      name := 'test-prompt',
-      messages := (
-        insert ext::ai::ChatPromptMessage {
-          participant_role := ext::ai::ChatParticipantRole.System,
-          content := "Your message content"
-        }
-      )
-    };
-
-``participant_role`` may be any of these values:
-
-* ``ext::ai::ChatParticipantRole.System``
-* ``ext::ai::ChatParticipantRole.User``
-* ``ext::ai::ChatParticipantRole.Assistant``
-* ``ext::ai::ChatParticipantRole.Tool``
-
-``ext::ai::ChatPromptMessage`` also has a ``participant_name`` property which
-is an optional ``str``.
-
-
-.. _ref_guide_ai_reference_index:
-
-Index
-=====
-
-The ``ext::ai::index`` creates a deferred semantic similarity index of an
-expression on a type.
-
-.. code-block:: sdl-diff
-
-      module default {
-        type Astronomy {
-          content: str;
-    +     deferred index ext::ai::index(embedding_model := 'text-embedding-3-small')
-    +       on (.content);
-        }
-      };
-
-It can accept several named arguments:
-
-* ``embedding_model``- The name of the model to use for embedding generation as
-  a string.
-
-  .. _ref_ai_reference_embedding_models:
-
-  You may use any of these pre-configured embedding generation models:
-
-  **OpenAI**
-
-  * ``text-embedding-3-small``
-  * ``text-embedding-3-large``
-  * ``text-embedding-ada-002``
-
-  `Learn more about the OpenAI embedding models <https://platform.openai.com/docs/guides/embeddings/embedding-models>`__
-
-  **Mistral**
-
-  * ``mistral-embed``
-
-  `Learn more about the Mistral embedding model <https://docs.mistral.ai/capabilities/embeddings/#mistral-embeddings-api>`__
-* ``distance_function``- The function to use for determining semantic
-  similarity. Default: ``ext::ai::DistanceFunction.Cosine``
-
-  The distance function may be any of these:
-
-  * ``ext::ai::DistanceFunction.Cosine``
-  * ``ext::ai::DistanceFunction.InnerProduct``
-  * ``ext::ai::DistanceFunction.L2``
-* ``index_type``- The type of index to create. Currently the only option is the
-  default: ``ext::ai::IndexType.HNSW``.
-* ``index_parameters``- A named tuple of additional index parameters:
-
-  * ``m``- The maximum number of edges of each node in the graph. Increasing
-    can increase the accuracy of searches at the cost of index size. Default:
-    ``32``
-  * ``ef_construction``- Dictates the depth and width of the search when
-    building the index. Higher values can lead to better connections and more
-    accurate results at the cost of time and resource usage when building the
-    index. Default: ``100``
-
-
-When indexes aren't working…
-----------------------------
-
-If you find your queries are not returning the expected results, try
-inspecting your instance logs. On a |Gel| Cloud instance, use the "Logs"
-tab in your instance dashboard. On local or :ref:`CLI-linked remote
-instances <ref_cli_gel_instance_link>`, use :gelcmd:`instance logs -I
-<instance-name>`. You may find the problem there.
-
-Providers impose rate limits on their APIs which can often be the source of
-AI index problems. If index creation hits a rate limit, Gel will wait
-the ``indexer_naptime`` (see the docs on :ref:`ext::ai configuration
-<ref_ai_reference_config>`) and resume index creation.
-
-If your indexed property contains values that exceed the token limit for a
-single request, you may consider truncating the property value in your
-index expression. You can do this with a string by slicing it:
-
-.. code-block:: sdl
-
-    module default {
-      type Astronomy {
-        content: str;
-        deferred index ext::ai::index(embedding_model := 'text-embedding-3-small')
-          on (.content[0:10000]);
-      }
-    };
-
-This example will slice the first 10,000 characters of the ``content``
-property for indexing.
-
-Tokens are not equivalent to characters. For OpenAI embedding generation,
-you may test values via `OpenAI's web-based tokenizer
-<https://platform.openai.com/tokenizer>`__. You may alternatively download
-the library OpenAI uses for tokenization from that same page if you prefer.
-By testing, you can get an idea how much of your content can be sent for
-indexing.
-
-
-Functions
-=========
-
-.. list-table::
-    :class: funcoptable
-
-    * - :eql:func:`ext::ai::to_context`
-      - :eql:func-desc:`ext::ai::to_context`
-
-    * - :eql:func:`ext::ai::search`
-      - :eql:func-desc:`ext::ai::search`
-
-
-------------
-
-
-.. eql:function:: ext::ai::to_context(object: anyobject) -> str
-
-    Evaluates the expression of an :ref:`ai::index
-    <ref_guide_ai_reference_index>` on the passed object and returns it.
-
-    This can be useful for confirming the basis of embedding generation for a
-    particular object or type.
-
-    Given this schema:
-
-    .. code-block:: sdl
-
-        module default {
-          type Astronomy {
-            topic: str;
-            content: str;
-            deferred index ext::ai::index(embedding_model := 'text-embedding-3-small')
-              on (.topic ++ ' ' ++ .content);
-          }
-        };
-
-    and with these inserts:
-
-    .. code-block:: edgeql-repl
-
-        db> insert Astronomy {
-        ...   topic := 'Mars',
-        ...   content := 'Skies on Mars are red.'
-        ... }
-        db> insert Astronomy {
-        ...   topic := 'Earth',
-        ...   content := 'Skies on Earth are blue.'
-        ... }
-
-    ``to_context`` returns these results:
-
-    .. code-block:: edgeql-repl
-
-        db> select ext::ai::to_context(Astronomy);
-        {'Mars Skies on Mars are red.', 'Earth Skies on Earth are blue.'}
-        db> select ext::ai::to_context((select Astronomy limit 1));
-        {'Mars Skies on Mars are red.'}
-
-
-------------
-
-
-.. eql:function:: ext::ai::search( \
-                    object: anyobject, \
-                    query: array<float32> \
-                  ) -> optional tuple<object: anyobject, distance: float64>
-
-    Search an object using its :ref:`ai::index <ref_guide_ai_reference_index>`
-    index.
-
-    Returns objects that match the specified semantic query and the
-    similarity score.
-
-    .. note::
-
-        The ``query`` argument should *not* be a textual query but the
-        embeddings generated *from* a textual query. To have |Gel| generate
-        the query for you along with a text response, try :ref:`our built-in
-        RAG <ref_ai_overview_rag>`.
-
-    .. code-block:: edgeql-repl
-
-        db> with query := <array<float32>><json>$query
-        ...   select ext::ai::search(Knowledge, query);
-        {
-          (
-            object := default::Knowledge {id: 9af0d0e8-0880-11ef-9b6b-4335855251c4},
-            distance := 0.20410746335983276
-          ),
-          (
-            object := default::Knowledge {id: eeacf638-07f6-11ef-b9e9-57078acfce39},
-            distance := 0.7843298847773637
-          ),
-          (
-            object := default::Knowledge {id: f70863c6-07f6-11ef-b9e9-3708318e69ee},
-            distance := 0.8560434728860855
-          ),
-        }
-
-
-HTTP endpoints
-==============
-
-Use the AI extension's HTTP endpoints to perform retrieval-augmented generation
-using your AI indexes or to generate embeddings against a model of your choice.
-
-.. note::
-
-    All |Gel| server HTTP endpoints require :ref:`authentication
-    <ref_http_auth>`. By default, you may use `HTTP Basic Authentication
-    <https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme>`_
-    with your Gel username and password.
-
-
-RAG
----
-
-``POST``: ``https://<gel-host>:<port>/branch/<branch-name>/ai/rag``
-
-Responds with text generated by the specified text generation model in response
-to the provided query.
-
-
-Request
-^^^^^^^
-
-Make a ``POST`` request to the endpoint with a JSON body. The body may have
-these properties:
-
-* ``model`` (string, required): The name of the text generation model to use.
-
-  .. _ref_ai_reference_text_generation_models:
-
-  You may use any of these text generation models:
-
-  **OpenAI**
-
-  * ``gpt-3.5-turbo``
-  * ``gpt-4-turbo-preview``
-
-  `Learn more about the OpenAI text generation models <https://platform.openai.com/docs/guides/text-generation>`__
-
-  **Mistral**
-
-  * ``mistral-small-latest``
-  * ``mistral-medium-latest``
-  * ``mistral-large-latest``
-
-  `Learn more about the Mistral text generation models <https://docs.mistral.ai/getting-started/models/>`__
-
-  **Anthropic**
-
-  * ``claude-3-haiku-20240307``
-  * ``claude-3-sonnet-20240229``
-  * ``claude-3-opus-20240229``
-
-  `Learn more about the Athropic text generation models <https://docs.anthropic.com/claude/docs/models-overview>`__
-
-* ``query`` (string, required): The query string use as the basis for text
-  generation.
-
-* ``context`` (object, required): Settings that define the context of the
-  query.
-
-  * ``query`` (string, required): Specifies an expression to determine the
-    relevant objects and index to serve as context for text generation. You may
-    set this to any expression that produces a set of objects, even if it is
-    not a standalone query.
-
-  * ``variables`` (object, optional): A dictionary of variables for use in the
-    context query.
-
-  * ``globals`` (object, optional): A dictionary of globals for use in the
-    context query.
-
-  * ``max_object_count`` (int, optional): Maximum number of objects to return;
-    default is 5.
-
-* ``stream`` (boolean, optional): Specifies whether the response should be
-  streamed. Defaults to false.
-
-* ``prompt`` (object, optional): Settings that define a prompt. Omit to use the
-  default prompt.
-
-  You may specify an existing prompt by its ``name`` or ``id``, you may define
-  a custom prompt inline by sending an array of objects, or you may do both to
-  augment an existing prompt with additional custom messages.
-
-  * ``name`` (string, optional) or ``id`` (string, optional): The ``name`` or
-    ``id`` of an existing custom prompt to use. Provide only one of these if
-    you want to use or start from an existing prompt.
-
-  * ``custom`` (array of objects, optional): Custom prompt messages, each
-    containing a ``role`` and ``content``. If no ``name`` or ``id`` was
-    provided, the custom messages provided here become the prompt. If one of
-    those was provided, these messages will be added to that existing prompt.
-
-**Example request**
-
-.. code-block::
-
-    curl --user <username>:<password> --json '{
-      "query": "What color is the sky on Mars?",
-      "model": "gpt-4-turbo-preview",
-      "context": {"query":"Knowledge"}
-    }' http://<gel-host>:<port>/branch/main/ai/rag
-
-
-Response
-^^^^^^^^
-
-**Example successful response**
-
-* **HTTP status**: 200 OK
-* **Content-Type**: application/json
-* **Body**:
-
-  .. code-block:: json
-
-      {"response": "The sky on Mars is red."}
-
-**Example error response**
-
-* **HTTP status**: 400 Bad Request
-* **Content-Type**: application/json
-* **Body**:
-
-  .. code-block:: json
-
-      {
-        "message": "missing required 'query' in request 'context' object",
-        "type": "BadRequestError"
-      }
-
-
-Streaming response (SSE)
-^^^^^^^^^^^^^^^^^^^^^^^^
-
-When the ``stream`` parameter is set to ``true``, the server uses `Server-Sent
-Events
-<https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events>`__
-(SSE) to stream responses. Here is a detailed breakdown of the typical
-sequence and structure of events in a streaming response:
-
-* **HTTP Status**: 200 OK
-* **Content-Type**: text/event-stream
-* **Cache-Control**: no-cache
-
-The stream consists of a sequence of five events, each encapsulating part of
-the response in a structured format:
-
-1. **Message start**
-
-   * Event type: ``message_start``
-
-   * Data: Starts a message, specifying identifiers and roles.
-
-   .. code-block:: json
-
-      {
-        "type": "message_start",
-        "message": {
-          "id": "<message_id>",
-          "role": "assistant",
-          "model": "<model_name>"
-        }
-      }
-
-2. **Content block start**
-
-   * Event type: ``content_block_start``
-
-   * Data: Marks the beginning of a new content block.
-
-   .. code-block:: json
-
-      {
-        "type": "content_block_start",
-        "index": 0,
-        "content_block": {
-          "type": "text",
-          "text": ""
-        }
-      }
-
-3. **Content block delta**
-
-   * Event type: ``content_block_delta``
-
-   * Data: Incrementally updates the content, appending more text to the
-     message.
-
-   .. code-block:: json
-
-      {
-        "type": "content_block_delta",
-        "index": 0,
-        "delta": {
-          "type": "text_delta",
-          "text": "The"
-        }
-      }
-
-   Subsequent ``content_block_delta`` events add more text to the message.
-
-4. **Content block stop**
-
-   * Event type: ``content_block_stop``
-
-   * Data: Marks the end of a content block.
-
-   .. code-block:: json
-
-      {
-        "type": "content_block_stop",
-        "index": 0
-      }
-
-5. **Message stop**
-
-   * Event type: ``message_stop``
-
-   * Data: Marks the end of the message.
-
-   .. code-block:: json
-
-      {"type": "message_stop"}
-
-Each event is sent as a separate SSE message, formatted as shown above. The
-connection is closed after all events are sent, signaling the end of the
-stream.
-
-**Example SSE response**
-
-.. code-block::
-
-    event: message_start
-    data: {"type": "message_start", "message": {"id": "chatcmpl-9MzuQiF0SxUjFLRjIdT3mTVaMWwiv", "role": "assistant", "model": "gpt-4-0125-preview"}}
-
-    event: content_block_start
-    data: {"type": "content_block_start","index":0,"content_block":{"type":"text","text":""}}
-
-    event: content_block_delta
-    data: {"type": "content_block_delta","index":0,"delta":{"type": "text_delta", "text": "The"}}
-
-    event: content_block_delta
-    data: {"type": "content_block_delta","index":0,"delta":{"type": "text_delta", "text": " skies"}}
-
-    event: content_block_delta
-    data: {"type": "content_block_delta","index":0,"delta":{"type": "text_delta", "text": " on"}}
-
-    event: content_block_delta
-    data: {"type": "content_block_delta","index":0,"delta":{"type": "text_delta", "text": " Mars"}}
-
-    event: content_block_delta
-    data: {"type": "content_block_delta","index":0,"delta":{"type": "text_delta", "text": " are"}}
-
-    event: content_block_delta
-    data: {"type": "content_block_delta","index":0,"delta":{"type": "text_delta", "text": " red"}}
-
-    event: content_block_delta
-    data: {"type": "content_block_delta","index":0,"delta":{"type": "text_delta", "text": "."}}
-
-    event: content_block_stop
-    data: {"type": "content_block_stop","index":0}
-
-    event: message_delta
-    data: {"type": "message_delta", "delta": {"stop_reason": "stop"}}
-
-    event: message_stop
-    data: {"type": "message_stop"}
-
-
-Embeddings
-----------
-
-``POST``: ``https://<gel-host>:<port>/branch/<branch-name>/ai/embeddings``
-
-Responds with embeddings generated by the specified embeddings model in
-response to the provided input.
-
-Request
-^^^^^^^
-
-Make a ``POST`` request to the endpoint with a JSON body. The body may have
-these properties:
-
-* ``input`` (array of strings or a single string, required): The text to use as
-  the basis for embeddings generation.
-
-* ``model`` (string, required): The name of the embedding model to use. You may
-  use any of the supported :ref:`embedding models
-  <ref_ai_reference_embedding_models>`.
-
-**Example request**
-
-.. code-block::
-
-    curl --user <username>:<password> --json '{
-      "input": "What color is the sky on Mars?",
-      "model": "text-embedding-3-small"
-    }' http://localhost:10931/branch/main/ai/embeddings
-
-
-Response
-^^^^^^^^
-
-**Example successful response**
-
-* **HTTP status**: 200 OK
-* **Content-Type**: application/json
-* **Body**:
-
-
-.. code-block:: json
-
-    {
-      "object": "list",
-      "data": [
-        {
-          "object": "embedding",
-          "index": 0,
-          "embedding": [-0.009434271, 0.009137661]
-        }
-      ],
-      "model": "text-embedding-3-small",
-      "usage": {
-        "prompt_tokens": 8,
-        "total_tokens": 8
-      }
-    }
-
-.. note::
-
-    The ``embedding`` property is shown here with only two values for brevity,
-    but an actual response would contain many more values.
-
-**Example error response**
-
-* **HTTP status**: 400 Bad Request
-* **Content-Type**: application/json
-* **Body**:
-
-  .. code-block:: json
-
-      {
-        "message": "missing or empty required \"model\" value  in request",
-        "type": "BadRequestError"
-      }
diff --git a/docs/ai/reference_extai.rst b/docs/ai/reference_extai.rst
new file mode 100644
index 00000000000..6772964acd3
--- /dev/null
+++ b/docs/ai/reference_extai.rst
@@ -0,0 +1,347 @@
+.. _ref_ai_extai_reference:
+
+============
+AI Extension
+============
+
+This reference documents the |Gel| AI extension's components, configuration
+options, and APIs.
+
+
+Enabling the Extension
+======================
+
+The AI extension can be enabled using the :ref:`extension <ref_datamodel_extensions>` mechanism:
+
+.. code-block:: sdl
+
+    using extension ai;
+
+Configuration
+=============
+
+The AI extension can be configured using ``configure session`` or ``configure current branch``:
+
+.. code-block:: edgeql
+
+    configure current branch
+    set ext::ai::Config::indexer_naptime := <duration>'0:00:30';
+
+Configuration Properties
+------------------------
+
+* ``indexer_naptime``: Duration
+    Specifies minimum delay between deferred ``ext::ai::index`` indexer runs.
+
+View current configuration:
+
+.. code-block:: edgeql
+
+    select cfg::Config.extensions[is ext::ai::Config]{*};
+
+Reset configuration:
+
+.. code-block:: edgeql
+
+    configure current branch reset ext::ai::Config::indexer_naptime;
+
+
+UI
+==
+
+The AI section of the UI can be accessed via the sidebar after the extension
+has been enabled in the schema. It provides ways to manage provider
+configurations and RAG prompts, as well as try out different settings in the
+playground.
+
+Navigation sidebar
+------------------
+
+Playground tab
+--------------
+
+Prompts tab
+-----------
+
+Providers tab
+-------------
+
+
+Index
+=====
+
+The ``ext::ai::index`` creates a deferred semantic similarity index of an
+expression on a type.
+
+.. code-block:: sdl-diff
+
+      module default {
+        type Astronomy {
+          content: str;
+    +     deferred index ext::ai::index(embedding_model := 'text-embedding-3-small')
+    +       on (.content);
+        }
+      };
+
+
+Parameters:
+
+* ``embedding_model``- The name of the model to use for embedding generation as
+  a string.
+* ``distance_function``- The function to use for determining semantic
+  similarity. Default: ``ext::ai::DistanceFunction.Cosine``
+* ``index_type``- The type of index to create. Currently the only option is the
+  default: ``ext::ai::IndexType.HNSW``.
+* ``index_parameters``- A named tuple of additional index parameters:
+
+  * ``m``- The maximum number of edges of each node in the graph. Increasing
+    can increase the accuracy of searches at the cost of index size. Default:
+    ``32``
+  * ``ef_construction``- Dictates the depth and width of the search when
+    building the index. Higher values can lead to better connections and more
+    accurate results at the cost of time and resource usage when building the
+    index. Default: ``100``
+
+* ``dimensions``: int64 (Optional) - Embedding dimensions
+* ``truncate_to_max``: bool (Default: False)
+
+Functions
+=========
+
+.. list-table::
+    :class: funcoptable
+
+    * - :eql:func:`ext::ai::to_context`
+      - :eql:func-desc:`ext::ai::to_context`
+
+    * - :eql:func:`ext::ai::search`
+      - :eql:func-desc:`ext::ai::search`
+
+
+------------
+
+
+.. eql:function:: ext::ai::to_context(object: anyobject) -> str
+
+    Returns the indexed expression value for an object with an ``ext::ai::index``.
+
+    **Example**:
+
+    Schema:
+
+    .. code-block:: sdl
+
+        module default {
+          type Astronomy {
+            topic: str;
+            content: str;
+            deferred index ext::ai::index(embedding_model := 'text-embedding-3-small')
+              on (.topic ++ ' ' ++ .content);
+          }
+        };
+
+    Data:
+
+    .. code-block:: edgeql-repl
+
+        db> insert Astronomy {
+        ...   topic := 'Mars',
+        ...   content := 'Skies on Mars are red.'
+        ... }
+        db> insert Astronomy {
+        ...   topic := 'Earth',
+        ...   content := 'Skies on Earth are blue.'
+        ... }
+
+    Results of calling ``to_context``:
+
+    .. code-block:: edgeql-repl
+
+        db> select ext::ai::to_context(Astronomy);
+
+        {'Mars Skies on Mars are red.', 'Earth Skies on Earth are blue.'}
+
+
+------------
+
+
+.. eql:function:: ext::ai::search( \
+                    object: anyobject, \
+                    query: array<float32> \
+                  ) -> optional tuple<object: anyobject, distance: float64>
+
+    Searches objects using their :ref:`ai::index <ref_guide_ai_reference_index>`.
+
+    Returns tuples of (object, distance).
+
+    .. note::
+
+        The ``query`` argument should *not* be a textual query but the
+        embeddings generated *from* a textual query.
+
+    .. code-block:: edgeql-repl
+
+        db> with query := <array<float32>><json>$query
+        ... select ext::ai::search(Knowledge, query);
+
+        {
+          (
+            object := default::Knowledge {id: 9af0d0e8-0880-11ef-9b6b-4335855251c4},
+            distance := 0.20410746335983276
+          ),
+          (
+            object := default::Knowledge {id: eeacf638-07f6-11ef-b9e9-57078acfce39},
+            distance := 0.7843298847773637
+          ),
+          (
+            object := default::Knowledge {id: f70863c6-07f6-11ef-b9e9-3708318e69ee},
+            distance := 0.8560434728860855
+          ),
+        }
+
+
+Types
+=====
+
+Provider Configuration Types
+----------------------------
+
+Provider configurations are required for AI indexes and RAG functionality.
+
+Example provider configuration:
+
+.. code-block:: edgeql
+
+    configure current database
+    insert ext::ai::OpenAIProviderConfig {
+      secret := 'sk-....',
+    };
+
+.. note::
+
+    All provider types require the ``secret`` property be set with a string
+    containing the secret provided by the AI vendor.
+
+
+.. note::
+
+    ``ext::ai::CustomProviderConfig requires an ``api_style`` property be set.
+
+ext::ai::ProviderAPIStyle
+^^^^^^^^^^^^^^^^^^^^^^^^^
+Enum defining supported API styles:
+
+* ``OpenAI``
+* ``Anthropic``
+
+ext::ai::ProviderConfig
+^^^^^^^^^^^^^^^^^^^^^^^
+Abstract base configuration for AI providers.
+
+Properties:
+* ``name``: str (Required) - Unique provider identifier
+* ``display_name``: str (Required) - Human-readable name
+* ``api_url``: str (Required) - Provider API endpoint
+* ``client_id``: str (Optional) - Provider-supplied client ID
+* ``secret``: str (Required) - Provider API secret
+* ``api_style``: ProviderAPIStyle (Required) - Provider's API style
+
+Provider-Specific Types
+^^^^^^^^^^^^^^^^^^^^^^^
+
+* ``ext::ai::OpenAIProviderConfig``
+* ``ext::ai::MistralProviderConfig``
+* ``ext::ai::AnthropicProviderConfig``
+* ``ext::ai::CustomProviderConfig``
+
+Each inherits from ``ProviderConfig`` with provider-specific defaults.
+
+Model Types
+----------
+
+ext::ai::Model
+^^^^^^^^^^^^^
+Abstract base type for AI models.
+
+Annotations:
+* ``model_name`` - Model identifier
+* ``model_provider`` - Provider identifier
+
+ext::ai::EmbeddingModel
+^^^^^^^^^^^^^^^^^^^^^^
+Abstract type for embedding models.
+
+Annotations:
+* ``embedding_model_max_input_tokens`` - Maximum tokens per input
+* ``embedding_model_max_batch_tokens`` - Maximum tokens per batch
+* ``embedding_model_max_output_dimensions`` - Maximum embedding dimensions
+* ``embedding_model_supports_shortening`` - Input shortening support flag
+
+ext::ai::TextGenerationModel
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+Abstract type for text generation models.
+
+Annotations:
+* ``text_gen_model_context_window`` - Model's context window size
+
+Indexing Types
+-------------
+
+ext::ai::DistanceFunction
+^^^^^^^^^^^^^^^^^^^^^^^^
+Enum for similarity metrics:
+
+* ``Cosine``
+* ``InnerProduct``
+* ``L2``
+
+ext::ai::IndexType
+^^^^^^^^^^^^^^^^^
+Enum for index implementations:
+
+* ``HNSW``
+
+
+Prompt Types
+------------
+
+Example custom prompt configuration:
+
+.. code-block:: edgeql
+
+    insert ext::ai::ChatPrompt {
+      name := 'test-prompt',
+      messages := (
+        insert ext::ai::ChatPromptMessage {
+          participant_role := ext::ai::ChatParticipantRole.System,
+          content := "Your message content"
+        }
+      )
+    };
+
+ext::ai::ChatParticipantRole
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Enum for chat participants:
+
+* ``System``
+* ``User``
+* ``Assistant``
+* ``Tool``
+
+ext::ai::ChatPromptMessage
+^^^^^^^^^^^^^^^^^^^^^^^^
+Type for chat prompt messages.
+
+Properties:
+* ``participant_role``: ChatParticipantRole (Required)
+* ``participant_name``: str (Optional)
+* ``content``: str (Required)
+
+ext::ai::ChatPrompt
+^^^^^^^^^^^^^^^^^
+Type for chat prompt configuration.
+
+Properties:
+* ``name``: str (Required)
+* ``messages``: set of ChatPromptMessage (Required)
+
diff --git a/docs/ai/reference_http.rst b/docs/ai/reference_http.rst
new file mode 100644
index 00000000000..690fd38b27c
--- /dev/null
+++ b/docs/ai/reference_http.rst
@@ -0,0 +1,409 @@
+.. _ref_ai_http_reference:
+
+=====================
+AI HTTP API Reference
+=====================
+
+:edb-alt-title: AI Extension HTTP API
+
+.. note::
+
+    All |Gel| server HTTP endpoints require :ref:`authentication
+    <ref_http_auth>`, such as `HTTP Basic Authentication
+    <https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme>`_
+    with Gel username and password.
+
+
+Embeddings
+==========
+
+``POST``: ``https://<gel-host>:<port>/branch/<branch-name>/ai/embeddings``
+
+Generates text embeddings using the specified embeddings model.
+
+
+Request headers
+---------------
+
+* `Content-Type: application/json` (required)
+
+
+Request body
+------------
+
+.. code-block:: json
+
+    {
+      "model": string,        // Required: Name of the embedding model
+      "inputs": string[],     // Required: Array of texts to embed
+      "dimensions": number,   // Optional: Number of dimensions to truncate to
+      "user": string          // Optional: User identifier
+    }
+
+* ``input`` (array of strings or a single string, required): The text to use as
+  the basis for embeddings generation.
+
+* ``model`` (string, required): The name of the embedding model to use. You may
+  use any of the supported :ref:`embedding models
+  <ref_ai_reference_embedding_models>`.
+
+
+Example request
+---------------
+
+.. code-block:: bash
+
+    curl --user <username>:<password> --json '{
+      "input": "What color is the sky on Mars?",
+      "model": "text-embedding-3-small"
+    }' http://localhost:10931/branch/main/ai/embeddings
+
+
+Response
+--------
+
+* **HTTP status**: 200 OK
+* **Content-Type**: application/json
+* **Body**:
+
+
+.. code-block:: json
+
+    {
+      "object": "list",
+      "data": [
+        {
+          "object": "embedding",
+          "index": 0,
+          "embedding": [-0.009434271, 0.009137661]
+        }
+      ],
+      "model": "text-embedding-3-small",
+      "usage": {
+        "prompt_tokens": 8,
+        "total_tokens": 8
+      }
+    }
+
+.. note::
+
+    The ``embedding`` property is shown here with only two values for brevity,
+    but an actual response would contain many more values.
+
+
+Error response
+--------------
+
+* **HTTP status**: 400 Bad Request
+* **Content-Type**: application/json
+* **Body**:
+
+  .. code-block:: json
+
+      {
+        "message": "missing or empty required \"model\" value  in request",
+        "type": "BadRequestError"
+      }
+
+RAG
+===
+
+``POST``: ``https://<gel-host>:<port>/branch/<branch-name>/ai/rag``
+
+Performs retrieval-augmented text generation using the specified model based on
+the provided text query and the database content selected using similarity
+search.
+
+
+Request headers
+---------------
+
+* `Content-Type: application/json` (required)
+
+```
+
+Request body
+------------
+
+.. code-block:: json
+
+    {
+      "context": {
+        "query": string,           // Required: EdgeQL query for context retrieval
+        "variables": object,       // Optional: Query variables
+        "globals": object,         // Optional: Query globals
+        "max_object_count": number // Optional: Max objects to retrieve (default: 5)
+      },
+      "model": string,            // Required: Name of the generation model
+      "query": string,            // Required: User query
+      "stream": boolean,          // Optional: Enable streaming (default: false)
+      "prompt": {
+        "name": string,           // Optional: Name of predefined prompt
+        "id": string,             // Optional: ID of predefined prompt
+        "custom": [               // Optional: Custom prompt messages
+          {
+            "role": string,       // "system"|"user"|"assistant"|"tool"
+            "content": string|object,
+            "tool_call_id": string,
+            "tool_calls": array
+          }
+        ]
+      },
+      "temperature": number,      // Optional: Sampling temperature
+      "top_p": number,           // Optional: Nucleus sampling parameter
+      "max_tokens": number,      // Optional: Maximum tokens to generate
+      "seed": number,            // Optional: Random seed
+      "safe_prompt": boolean,    // Optional: Enable safety features
+      "top_k": number,           // Optional: Top-k sampling parameter
+      "logit_bias": object,      // Optional: Token biasing
+      "logprobs": number,        // Optional: Return token log probabilities
+      "user": string             // Optional: User identifier
+    }
+
+
+* ``model`` (string, required): The name of the text generation model to use.
+
+  .. _ref_ai_reference_text_generation_models:
+
+  List of supported text generation models:
+
+  **OpenAI**
+
+  * ``gpt-3.5-turbo``
+  * ``gpt-4-turbo-preview``
+
+  `Learn more about the OpenAI text generation models <https://platform.openai.com/docs/guides/text-generation>`__
+
+  **Mistral**
+
+  * ``mistral-small-latest``
+  * ``mistral-medium-latest``
+  * ``mistral-large-latest``
+
+  `Learn more about the Mistral text generation models <https://docs.mistral.ai/getting-started/models/>`__
+
+  **Anthropic**
+
+  * ``claude-3-haiku-20240307``
+  * ``claude-3-sonnet-20240229``
+  * ``claude-3-opus-20240229``
+
+  `Learn more about the Athropic text generation models <https://docs.anthropic.com/claude/docs/models-overview>`__
+
+* ``query`` (string, required): The query string use as the basis for text
+  generation.
+
+* ``context`` (object, required): Settings that define the context of the
+  query.
+
+  * ``query`` (string, required): Specifies an expression to determine the
+    relevant objects and index to serve as context for text generation. You may
+    set this to any expression that produces a set of objects, even if it is
+    not a standalone query.
+
+  * ``variables`` (object, optional): A dictionary of variables for use in the
+    context query.
+
+  * ``globals`` (object, optional): A dictionary of globals for use in the
+    context query.
+
+  * ``max_object_count`` (int, optional): Maximum number of objects to return;
+    default is 5.
+
+* ``stream`` (boolean, optional): Specifies whether the response should be
+  streamed. Defaults to false.
+
+* ``prompt`` (object, optional): Settings that define a prompt. Omit to use the
+  default prompt.
+
+  You may specify an existing prompt by its ``name`` or ``id``, you may define
+  a custom prompt inline by sending an array of objects, or you may do both to
+  augment an existing prompt with additional custom messages.
+
+  * ``name`` (string, optional) or ``id`` (string, optional): The ``name`` or
+    ``id`` of an existing custom prompt to use. Provide only one of these if
+    you want to use or start from an existing prompt.
+
+  * ``custom`` (array of objects, optional): Custom prompt messages, each
+    containing a ``role`` and ``content``. If no ``name`` or ``id`` was
+    provided, the custom messages provided here become the prompt. If one of
+    those was provided, these messages will be added to that existing prompt.
+
+
+Example request
+---------------
+
+.. code-block::
+
+    curl --user <username>:<password> --json '{
+      "query": "What color is the sky on Mars?",
+      "model": "gpt-4-turbo-preview",
+      "context": {"query":"Knowledge"}
+    }' http://<gel-host>:<port>/branch/main/ai/rag
+
+
+Response
+--------
+
+* **HTTP status**: 200 OK
+* **Content-Type**: application/json
+* **Body**:
+
+  .. code-block:: json
+
+      {"response": "The sky on Mars is red."}
+
+Error response
+--------------
+
+* **HTTP status**: 400 Bad Request
+* **Content-Type**: application/json
+* **Body**:
+
+  .. code-block:: json
+
+      {
+        "message": "missing required 'query' in request 'context' object",
+        "type": "BadRequestError"
+      }
+
+
+Streaming response (SSE)
+------------------------
+
+When the ``stream`` parameter is set to ``true``, the server uses `Server-Sent
+Events
+<https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events>`__
+(SSE) to stream responses. Here is a detailed breakdown of the typical
+sequence and structure of events in a streaming response:
+
+* **HTTP Status**: 200 OK
+* **Content-Type**: text/event-stream
+* **Cache-Control**: no-cache
+
+The stream consists of a sequence of five events, each encapsulating part of
+the response in a structured format:
+
+1. **Message start**
+
+   * Event type: ``message_start``
+
+   * Data: Starts a message, specifying identifiers and roles.
+
+   .. code-block:: json
+
+      {
+        "type": "message_start",
+        "message": {
+          "id": "<message_id>",
+          "role": "assistant",
+          "model": "<model_name>"
+        }
+      }
+
+2. **Content block start**
+
+   * Event type: ``content_block_start``
+
+   * Data: Marks the beginning of a new content block.
+
+   .. code-block:: json
+
+      {
+        "type": "content_block_start",
+        "index": 0,
+        "content_block": {
+          "type": "text",
+          "text": ""
+        }
+      }
+
+3. **Content block delta**
+
+   * Event type: ``content_block_delta``
+
+   * Data: Incrementally updates the content, appending more text to the
+     message.
+
+   .. code-block:: json
+
+      {
+        "type": "content_block_delta",
+        "index": 0,
+        "delta": {
+          "type": "text_delta",
+          "text": "The"
+        }
+      }
+
+   Subsequent ``content_block_delta`` events add more text to the message.
+
+4. **Content block stop**
+
+   * Event type: ``content_block_stop``
+
+   * Data: Marks the end of a content block.
+
+   .. code-block:: json
+
+      {
+        "type": "content_block_stop",
+        "index": 0
+      }
+
+5. **Message stop**
+
+   * Event type: ``message_stop``
+
+   * Data: Marks the end of the message.
+
+   .. code-block:: json
+
+      {"type": "message_stop"}
+
+Each event is sent as a separate SSE message, formatted as shown above. The
+connection is closed after all events are sent, signaling the end of the
+stream.
+
+**Example SSE response**
+
+.. code-block::
+    :class: collapsible
+
+    event: message_start
+    data: {"type": "message_start", "message": {"id": "chatcmpl-9MzuQiF0SxUjFLRjIdT3mTVaMWwiv", "role": "assistant", "model": "gpt-4-0125-preview"}}
+
+    event: content_block_start
+    data: {"type": "content_block_start","index":0,"content_block":{"type":"text","text":""}}
+
+    event: content_block_delta
+    data: {"type": "content_block_delta","index":0,"delta":{"type": "text_delta", "text": "The"}}
+
+    event: content_block_delta
+    data: {"type": "content_block_delta","index":0,"delta":{"type": "text_delta", "text": " skies"}}
+
+    event: content_block_delta
+    data: {"type": "content_block_delta","index":0,"delta":{"type": "text_delta", "text": " on"}}
+
+    event: content_block_delta
+    data: {"type": "content_block_delta","index":0,"delta":{"type": "text_delta", "text": " Mars"}}
+
+    event: content_block_delta
+    data: {"type": "content_block_delta","index":0,"delta":{"type": "text_delta", "text": " are"}}
+
+    event: content_block_delta
+    data: {"type": "content_block_delta","index":0,"delta":{"type": "text_delta", "text": " red"}}
+
+    event: content_block_delta
+    data: {"type": "content_block_delta","index":0,"delta":{"type": "text_delta", "text": "."}}
+
+    event: content_block_stop
+    data: {"type": "content_block_stop","index":0}
+
+    event: message_delta
+    data: {"type": "message_delta", "delta": {"stop_reason": "stop"}}
+
+    event: message_stop
+    data: {"type": "message_stop"}
+
+
diff --git a/docs/ai/python.rst b/docs/ai/reference_python.rst
similarity index 91%
rename from docs/ai/python.rst
rename to docs/ai/reference_python.rst
index 19d0af9a791..3f0409731e3 100644
--- a/docs/ai/python.rst
+++ b/docs/ai/reference_python.rst
@@ -1,82 +1,56 @@
-.. _ref_ai_python:
+.. _ref_ai_python_reference:
 
-======
-Python
-======
+=============
+AI Python API
+=============
 
-:edb-alt-title: Gel AI's Python package
+:edb-alt-title: AI Extension Python API
 
 The ``gel.ai`` package is an optional binding of the AI extension in |Gel|.
-To use the AI binding, you need to install ``gel`` Python package with the
-``ai`` extra dependencies:
 
 .. code-block:: bash
 
   $ pip install 'gel[ai]'
 
 
-Usage
-=====
+Blocking and async API
+======================
+
+The AI binding is built on top of the regular |Gel| client objects, providing
+both blocking and asynchronous versions of its API.
 
-Start by importing ``gel`` and ``gel.ai``:
+**Blocking client example**:
 
 .. code-block:: python
 
     import gel
     import gel.ai
 
-
-Blocking
---------
-
-The AI binding is built on top of the regular |Gel| client objects, providing
-both blocking and asynchronous versions of its API. For example, a blocking AI
-client is initialized like this:
-
-.. code-block:: python
-
     client = gel.create_client()
     gpt4ai = gel.ai.create_ai(
         client,
         model="gpt-4-turbo-preview"
     )
 
-Add your query as context:
-
-.. code-block:: python
-
     astronomy_ai = gpt4ai.with_context(
         query="Astronomy"
     )
 
-The default text generation prompt will ask your selected provider to limit
-answer to information provided in the context and will pass the queried
-objects' AI index as context along with that prompt.
-
-Call your AI client's ``query_rag`` method, passing in a text query.
-
-.. code-block:: python
-
     print(
         astronomy_ai.query_rag("What color is the sky on Mars?")
     );
 
-or stream back the results by using ``stream_rag`` instead:
-
-.. code-block:: python
-
     for data in astronomy_ai.stream_rag("What color is the sky on Mars?"):
         print(data)
 
 
-Async
------
-
-To use an async client instead, do this:
+**Async client example**:
 
 .. code-block:: python
 
-    import asyncio  # alongside the Gel imports
+    import gel
+    import gel.ai
+    import asyncio
 
     client = gel.create_async_client()
 
@@ -100,8 +74,8 @@ To use an async client instead, do this:
     asyncio.run(main())
 
 
-API reference
-=============
+Factory functions
+=================
 
 .. py:function:: create_ai(client, **kwargs) -> GelAI
 
@@ -140,9 +114,8 @@ API reference
        * ``prompt``: An optional prompt to guide the model's behavior. (default: None)
 
 
-AI client classes
------------------
-
+Core classes
+============
 
 BaseGelAI
 ^^^^^^^^^
@@ -253,6 +226,14 @@ GelAI
        the query. If not provided, uses the default context of this AI client
        instance.
 
+.. py:method:: generate_embeddings(*inputs: str, model: str) -> list[float]
+
+    Generates embeddings for input texts.
+
+    :param *inputs:
+        Input texts.
+    :param model:
+        The embedding model to use
 
 AsyncGelAI
 ^^^^^^^^^^
@@ -301,9 +282,18 @@ AsyncGelAI
        the query. If not provided, uses the default context of this AI client
        instance.
 
+.. py:method:: generate_embeddings(*inputs: str, model: str) -> list[float]
+
+    Generates embeddings for input texts.
 
-Other classes
--------------
+    :param *inputs:
+        Input texts.
+    :param model:
+        The embedding model to use
+
+
+Configuration classes
+=====================
 
 .. py:class:: ChatParticipantRole
 
@@ -414,3 +404,4 @@ Other classes
    :method to_httpx_request():
        Converts the RAGRequest into a dictionary suitable for making an HTTP
        request using the httpx library.
+