Merge pull request #5728 from EnterpriseDB/release-2024-06-04a

Release 2024-06-04a
EnterpriseDB · Jun 4, 2024 · 5e4dbc8 · 5e4dbc8 · github-actions · Jun 4, 2024
2 parents 8e8d651 + f85ea19
commit 5e4dbc8
Show file tree

Hide file tree

Showing 56 changed files with 462 additions and 489 deletions.
diff --git a/.github/workflows/sync-and-process-files.yml b/.github/workflows/sync-and-process-files.yml
@@ -53,4 +53,4 @@ jobs:
           path: destination/
           reviewers: ${{ env.REVIEWERS }}
           title: ${{ env.TITLE }}
-          token: ${{ secrets.SYNC_FILES_TOKEN }}
+          token: ${{ secrets.GH_TOKEN }}
diff --git a/advocacy_docs/edb-postgres-ai/ai-ml/install-tech-preview.mdx b/advocacy_docs/edb-postgres-ai/ai-ml/install-tech-preview.mdx
@@ -9,10 +9,9 @@ The preview release of pgai is distributed as a self-contained Docker container
 
 ## Configuring and running the container image
 
-If you haven’t already, sign up for an EDB account and log in to the EDB container registry.
+If you haven't already, sign up for an EDB account and log in to the EDB container registry.
 
-
-Log in to docker with your the username tech-preview and your EDB Repo 2.0 Subscription Token as your password:
+Log in to Docker with the username tech-preview and your EDB Repo 2.0 subscription token as your password:
 
 ```shell
 docker login docker.enterprisedb.com -u tech-preview -p <your_EDB_repo_token>
@@ -65,13 +64,13 @@ docker run -d --name pgai \
 
 ## Connect to Postgres
 
-If you haven’t yet, install the Postgres command-line tools. If you’re on a Mac, using Homebrew, you can install it as follows:
+If you haven't yet, install the Postgres command-line tools. If you're on a Mac, using Homebrew, you can install it as follows:
 
 ```shell
 brew install libpq
 ```
 
-Connect to the tech preview PostgreSQL running in the container. Note that this relies on $PGPASSWORD being set - if you’re using a different terminal for this part, make sure you re-export the password:
+Connect to the tech preview PostgreSQL running in the container. Note that this relies on $PGPASSWORD being set - if you're using a different terminal for this part, make sure you re-export the password:
 
 ```shell
 psql -h localhost -p 15432 -U postgres postgres

diff --git a/advocacy_docs/edb-postgres-ai/ai-ml/using-tech-preview/additional_functions.mdx b/advocacy_docs/edb-postgres-ai/ai-ml/using-tech-preview/additional_functions.mdx
@@ -1,5 +1,5 @@
 ---
-title: Additional functions and stand-alone embedding in pgai
+title: Additional functions and standalone embedding in pgai
 navTitle:  Additional functions
 description: Other pgai extension functions and how to generate embeddings for images and text.
 ---

diff --git a/...cy_docs/edb-postgres-ai/ai-ml/using-tech-preview/working-with-ai-data-in-S3.mdx b/...cy_docs/edb-postgres-ai/ai-ml/using-tech-preview/working-with-ai-data-in-S3.mdx
@@ -8,7 +8,7 @@ We recommend you to prepare your own S3 compatible object storage bucket with so
 
 In addition we use image data and an according image encoder LLM in this example instead of text data. But you could also use plain text data on object storage similar to the examples in the previous section.
 
-First let’s create a retriever for images stored on s3-compatible object storage as the source.  We specify torsten as the bucket name and an endpoint URL where the bucket is created. We specify an empty string as prefix because we want all the objects in that bucket. We use the [`clip-vit-base-patch32`](https://huggingface.co/openai/clip-vit-base-patch32) open encoder model for image data from HuggingFace. We provide a name for the retriever so that we can identify and reference it subsequent operations:
+First let's create a retriever for images stored on s3-compatible object storage as the source.  We specify torsten as the bucket name and an endpoint URL where the bucket is created. We specify an empty string as prefix because we want all the objects in that bucket. We use the [`clip-vit-base-patch32`](https://huggingface.co/openai/clip-vit-base-patch32) open encoder model for image data from HuggingFace. We provide a name for the retriever so that we can identify and reference it subsequent operations:
 
 ```sql
 SELECT pgai.create_s3_retriever(
@@ -39,7 +39,7 @@ __OUTPUT__
 (1 row)
 ```
 
-Finally, run the retrieve_via_s3 function with the required parameters to retrieve the top K most relevant (most similar) AI data items. Please be aware that the object type is currently limited to image and text files.
+Finally, run the retrieve_via_s3 function with the required parameters to retrieve the top K most relevant (most similar) AI data items. Be aware that the object type is currently limited to image and text files.
 
 ```sql 
 SELECT data from pgai.retrieve_via_s3(

diff --git a/...s/edb-postgres-ai/ai-ml/using-tech-preview/working-with-ai-data-in-postgres.mdx b/...s/edb-postgres-ai/ai-ml/using-tech-preview/working-with-ai-data-in-postgres.mdx
@@ -6,9 +6,9 @@ description: How to work with AI data stored in Postgres tables using the pgai e
 
 We will first look at working with AI data stored in columns in the Postgres table. 
 
-To see how to use AI data stored in S3-compatible object storage, please skip to the next section.
+To see how to use AI data stored in S3-compatible object storage, skip to the next section.
 
-First let’s create a Postgres table for some test AI data: 
+First let's create a Postgres table for some test AI data: 
 
 ```sql
 CREATE TABLE products (
@@ -22,7 +22,7 @@ CREATE TABLE
 ```
 
 
-Now let’s create a retriever with the just created products table as the source. We specify product_id as the unique key column to and we define the product_name and description columns to use for the similarity search by the retriever. We use the `all-MiniLM-L6-v2` open encoder model from HuggingFace. We set `auto_embedding` to True so that any future insert, update or delete to the source table will automatically generate, update or delete also the corresponding embedding. We provide a name for the retriever so that we can identify and reference it subsequent operations:
+Now let's create a retriever with the just created products table as the source. We specify product_id as the unique key column to and we define the product_name and description columns to use for the similarity search by the retriever. We use the `all-MiniLM-L6-v2` open encoder model from HuggingFace. We set `auto_embedding` to True so that any future insert, update or delete to the source table will automatically generate, update or delete also the corresponding embedding. We provide a name for the retriever so that we can identify and reference it subsequent operations:
 
 ```sql
 SELECT pgai.create_pg_retriever(
@@ -44,7 +44,7 @@ __OUTPUT__
 
 
 
-Now let’s insert some AI data records into the products table. Since we have set auto_embedding to True, the retriever will automatically generate all embeddings in real-time for each inserted record: 
+Now let's insert some AI data records into the products table. Since we have set auto_embedding to True, the retriever will automatically generate all embeddings in real-time for each inserted record: 
 
 ```sql
 INSERT INTO products (product_name, description) VALUES
@@ -80,7 +80,7 @@ __OUTPUT__
 (5 rows)
 ```
 
-Now let’s try a retriever without auto embedding. This means that the application has control over when the embeddings are computed in a bulk fashion. For demonstration we can simply create a second retriever for the same products table that we just created above:
+Now let's try a retriever without auto embedding. This means that the application has control over when the embeddings are computed in a bulk fashion. For demonstration we can simply create a second retriever for the same products table that we just created above:
 
 ```sql
 SELECT pgai.create_pg_retriever(
@@ -115,7 +115,7 @@ __OUTPUT__
 (0 rows)
 ```
 
-That’s why we first need to run a bulk generation of embeddings. This is achieved via the `refresh_retriever()` function:
+That's why we first need to run a bulk generation of embeddings. This is achieved via the `refresh_retriever()` function:
 
 ```sql
 SELECT pgai.refresh_retriever(
@@ -148,7 +148,7 @@ __OUTPUT__
 (5 rows)
 ```
 
-Now let’s see what happens if we add additional AI data records:
+Now let's see what happens if we add additional AI data records:
 
 ```sql
 INSERT INTO products (product_name, description) VALUES

diff --git a/advocacy_docs/edb-postgres-ai/analytics/concepts.mdx b/advocacy_docs/edb-postgres-ai/analytics/concepts.mdx
@@ -7,45 +7,45 @@ description: Learn about the ideas and terminology behind EDB Postgres Lakehouse
 EDB Postgres Lakehouse is the solution for running Rapid Analytics against
 operational data on the EDB Postgres® AI platform.
 
-## Major Concepts
+## Major concepts
 
-* **Lakehouse Nodes** query **Lakehouse Tables** in **Managed Storage Locations**.
-* **Lakehouse Sync** can create **Lakehouse Tables** from **Transactional Tables** in a source database.
+* **Lakehouse nodes** query **Lakehouse tables** in **managed storage locations**.
+* **Lakehouse Sync** can create **Lakehouse tables** from **Transactional tables** in a source database.
 
 Here's how it fits together:
 
 ![Level 50 basic architecture](./images/level-50-architecture.png)
 
-### Lakehouse Node
+### Lakehouse node
 
-A Postgres Lakehouse Node is Postgres, with a Vectorized Query Engine that's
-optimized to query Lakehouse Tables, but still fall back to Postgres for full
+A Postgres Lakehouse node is Postgres, with a Vectorized Query Engine that's
+optimized to query Lakehouse tables, but still fall back to Postgres for full
 compatibility.
 
 Lakehouse nodes are stateless and ephemeral. Scale them up or down based on
 workload requirements.
 
-### Lakehouse Tables
+### Lakehouse tables
 
-Lakehouse Tables are stored using highly compresible, columnar storage formats
+Lakehouse Tables are stored using highly compressible, columnar storage formats
 optimized for analytics and interoperable with the rest of the Analytics ecosystem.
-Currently, Postgres Lakehouse Nodes can read tables stored using the Delta
+Currently, Postgres Lakehouse nodes can read tables stored using the Delta
 Protocol ("delta tables"), and Lakehouse Sync can write them.
 
-### Managed Storage Location
+### Managed storage location
 
-A Managed Storage Location is where you can organize Lakehouse Tables in
+A *managed storage location* is where you can organize Lakehouse tables in
 object storage, so that Postgres Lakehouse can query them.
 
-A "Managed Storage Location" is a location in object storage where we control
+A managed storage location is a location in object storage where we control
 the file layout and write Lakehouse Tables on your behalf. Technically, it's an
 implementation detail that we store these in buckets. This is really a subset
 of an upcoming "Storage Location" feature that will also support
 "External Storage Locations," where you bring your own bucket.
 
 ### Lakehouse Sync
 
-Lakehouse Sync is a Data Migration Service offered as part of the EDB
+Lakehouse Sync is a data migration service offered as part of the EDB
 Postgres AI platform. It can "sync" tables from a transactional database, to
 Lakehouse Tables in a destination Storage Location. Currently, it supports
 source databases hosted in the EDB Postgres AI Cloud Service (formerly known as
@@ -58,28 +58,28 @@ It's built using [Debezium](https://debezium.io).
 ### Lakehouse
 
 The
-"[Lakehouse Architecture](https://15721.courses.cs.cmu.edu/spring2023/papers/02-modern/armbrust-cidr21.pdf)"
+"[Lakehouse architecture](https://15721.courses.cs.cmu.edu/spring2023/papers/02-modern/armbrust-cidr21.pdf)"
 is a data engineering practice, which is a portmanteau of "Data _Lake_" and "Data
 Ware_house_," offering the best of both. The central tenet of the architecture is
 that data is stored in Object Storage, generally in columnar formats like
 Parquet, where different query engines can process it for their own specialized
 purposes, using the optimal compute resources for a given query.
 
-### Vectorized Query Engine
+### Vectorized query engine
 
 A vectorized query engine is a query engine that's optimized for running queries
 on columnar data. Most analytics engines use vectorized query execution.
 Postgres Lakehouse uses [Apache DataFusion](https://datafusion.apache.org/).
 
-### Delta Tables
+### Delta tables
 
-We use the term "Lakehouse Tables" to avoid overcommitting to a particular
+We use the term "Lakehouse tables" to avoid overcommitting to a particular
 format (since we might eventually support Iceberg or Hudi, for example). But
 technically, we're using [Delta Tables](https://delta.io/). A Delta Table
 is a well-defined container of Parquet files and JSON metadata, according to
 the "Delta Lake" spec and open protocol. Delta Lake is a Linux Foundation project.
 
-## How it Works
+## How it works
 
 Postgres Lakehouse is built using a number of technologies:
 

diff --git a/advocacy_docs/edb-postgres-ai/analytics/index.mdx b/advocacy_docs/edb-postgres-ai/analytics/index.mdx
@@ -1,6 +1,6 @@
 ---
-title: Lakehouse Analytics
-navTitle: Lakehouse Analytics
+title: Lakehouse analytics
+navTitle: Lakehouse analytics
 indexCards: simple
 iconName: Improve
 navigation:
@@ -11,19 +11,19 @@ navigation:
 
 EDB Postgres Lakehouse extends the power of Postgres to analytical workloads,
 by adding a vectorized query engine and separating storage from compute. Building
-a Data Lakehouse has never been easier – just use Postgres.
+a data Lakehouse has never been easier: just use Postgres.
 
-## Rapid Analytics for Postgres
+## Rapid analytics for Postgres
 
 Postgres Lakehouse is a core offering of the EDB Postgres® AI platform, extending
 Postgres to support analytical queries over columnar data in object storage,
 while keeping the simplicity and ease of use that Postgres users love.
 
-With Postgres Lakehouse, you can query your Postgres data with a Lakehouse Node,
-an ephemeral, scale-to-zero compute resource powered by Postgres that’s optimized for
+With Postgres Lakehouse, you can query your Postgres data with a Lakehouse node,
+an ephemeral, scale-to-zero compute resource powered by Postgres that's optimized for
 vectorized query execution over columnar data.
 
-## Postgres Native
+## Postgres native
 
 Never leave the Postgres ecosystem.
 
@@ -33,16 +33,16 @@ columnar tables in object storage using the open source Delta Lake protocol.
 
 EDB Postgres Lakehouse is “just Postgres” – you can query it with any Postgres
 client, and it fully supports all Postgres queries, functions and statements, so
-there’s no need to change existing queries or reconfigure business
+there's no need to change existing queries or reconfigure business
 intelligence software.
 
-## Vectorized Execution
+## Vectorized execution
 
-Postgres Lakehouse uses Apache DataFusion’s vectorized SQL query engine to
+Postgres Lakehouse uses Apache DataFusion's vectorized SQL query engine to
 execute analytical queries 5-100x faster (30x on average) compared to native
 Postgres, while still falling back to native execution when necessary.
 
-## Columnar Storage
+## Columnar storage
 
 Postgres Lakehouse is optimized to query "Lakehouse Tables" in object storage,
 extending the power of open source database to open table formats. Currently,
@@ -54,20 +54,19 @@ You can sync your own data from tables in transactional sources (initially, EDB
 Postgres® AI Cloud Service databases) into Lakehouse Tables in Storage Locations
 (initially, managed locations in S3 object storage).
 
-## Fully Managed Service
+## Fully managed service
 
 You can launch Postgres Lakehouse nodes using the EDB Postgres AI Cloud
-Service (formerly EDB BigAnimal). Point a Lakehouse Node at a storage bucket
+Service (formerly EDB BigAnimal). Point a Lakehouse node at a storage bucket
 with some Delta Tables in it, and get results of analytical (OLAP) queries in
 less time than if you queried the same data in a transactional Postgres database.
 
 Postgres Lakehouse nodes are available now for customers using
 EDB Postgres AI - Hosted environments on AWS, and will be rolling out
 to additional cloud environments soon.
 
-## Try Today
+## Try it today
 
-It’s easy to start using Postgres Lakehouse. Provision a Lakehouse Node in five
-minutes, and start qureying pre-loaded benchmark data like TPC-H, TPC-DS,
+It's easy to start using Postgres Lakehouse. Provision a Lakehouse node in five
+minutes, and start querying pre-loaded benchmark data like TPC-H, TPC-DS,
 Clickbench, and the 1 Billion Row challenge.
-
diff --git a/advocacy_docs/edb-postgres-ai/analytics/quick_start.mdx b/advocacy_docs/edb-postgres-ai/analytics/quick_start.mdx
@@ -1,20 +1,20 @@
 ---
 title: Quick Start - EDB Postgres Lakehouse
 navTitle: Quick Start
-description: Launch a Lakehouse Node and query sample data.
+description: Launch a Lakehouse node and query sample data.
 ---
 
 In this guide, you will:
 
-1. Create a Lakehouse Node
+1. Create a Lakehouse node
 2. Connect to the node with your preferred Postgres client
 3. Query sample data (TPC-H, TPC-DS, Clickbench, or 1BRC) in object storage
 
 For more details and advanced use cases, see [reference](./reference).
 
 ## Introduction
 
-Postgres Lakehouse is a new type of Postgres “cluster” (it’s really just one
+Postgres Lakehouse is a new type of Postgres “cluster” (it's really just one
 node) that you can provision in EDB Postgres® AI Cloud Services (formerly known
 as "BigAnimal"). It includes a vectorized query engine (based on Apache
 [DataFusion](https://github.com/apache/datafusion)) for fast queries over
@@ -39,18 +39,18 @@ restarts and will be saved as part of backup/restore operations. Otherwise,
 Lakehouse tables will not be part of backups, since they are ultimately stored
 in object storage.
 
-### Basic Architecture
+### Basic architecture
 
-Here's "what's in the box of a Lakehouse Node:
+Here's what's in the box of a Lakehouse node:
 
-![Level 300 Architecture of Postgres Lakehouse Node](./images/level-300-architecture.png)
+![Level 300 Architecture of Postgres Lakehouse node](./images/level-300-architecture.png)
 
-## Getting Started
+## Getting started
 
-You will need an EDB Postgres AI account. Once you’ve logged in and created
+You will need an EDB Postgres AI account. Once you've logged in and created
 a project, you can create a cluster.
 
-### Create a Lakehouse Node
+### Create a Lakehouse node
 
 You will see a “Lakehouse Analytics” option under the “Create New” dropdown
 on your project page:
@@ -79,13 +79,13 @@ block storage device and will survive a restart or backup/restore cycle.
 * Only Postgres 16 is supported.
 
 For more notes about supported instance sizes,
-see [reference - supported AWS instances](./reference/#supported-aws-instances).
+see [Reference - Supported AWS instances](./reference/#supported-aws-instances).
 
-## Operating a Lakehouse Node
+## Operating a Lakehouse node
 
-### Connect to the Node
+### Connect to the node
 
-You can connect to the Lakehouse Node with any Postgres client, in the same way
+You can connect to the Lakehouse node with any Postgres client, in the same way
 that you connect to any other cluster from EDB Postgres AI Cloud Service
 (formerly known as BigAnimal): navigate to the cluster detail page and copy its
 connection string.
@@ -121,9 +121,9 @@ remain untouched.
 storage (but it supports write queries to system tables for creating users,
 etc.). You cannot write directly to object storage. You cannot create new tables.
 * If you want to load your own data into object storage,
-see [reference - bring your own data](./reference/#advanced-bring-your-own-data).
+see [Reference - Bring your own data](./reference/#advanced-bring-your-own-data).
 
-## Inspect the Benchmark Datasets
+## Inspect the benchmark datasets
 
 Inspect the Benchmark Datasets.  Every cluster has some benchmarking data
 available out of the box. If you are using pgcli, you can run `\dn` to see
@@ -137,9 +137,9 @@ The available benchmarking datsets are:
 * 1 Billion Row Challenge
 
 For more details on benchmark datasets,
-see [reference - available benchmarking datasets](./reference/#available-benchmarking-datasets).
+see Reference - Available benchmarking datasets](./reference/#available-benchmarking-datasets).
 
-## Query the Benchmark Datasets
+## Query the benchmark datasets
 
 You can try running some basic queries:
 
@@ -164,5 +164,5 @@ SELECT 1
 Time: 0.651s
 ```
 
-Note: Do not use `search_path`! Please read the [reference](./reference)
+Note: Do not use `search_path`! Read the [reference](./reference)
 page for more gotchas and information about syntax/query compatibility.