Skip to content

Commit 06a5c3a

Browse files
authored
blog post: Build a Vector Extension for Postgres - Introduction (#2)
1 parent 16e5725 commit 06a5c3a

File tree

4 files changed

+172
-0
lines changed

4 files changed

+172
-0
lines changed

assets/images/avatar/SteveLauC.jpg

22.3 KB
Loading
Loading

content/english/authors/SteveLauC.md

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
---
2+
title: SteveLauC
3+
4+
image: "/images/avatar/SteveLauC.jpg"
5+
description: Software Engineer
6+
social:
7+
- name: github
8+
icon: fa-brands fa-github
9+
link: https://github.com/SteveLauC
10+
---
11+
12+
Steve is just a guy seeking happiness, value, and doing something good along the way.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
---
2+
title: "Build a Vector Extension for Postgres - Introduction"
3+
meta_title: "Build a Vector Extension for Postgres - Introduction"
4+
description: ""
5+
date: 2024-12-18T08:00:00Z
6+
image: "/images/posts/2024/build_a_vector_extension_for_postgres_introduction/bg.png"
7+
categories: ["vector database", "Postgres"]
8+
author: "SteveLauC"
9+
tags: ["vector database", "Postgres"]
10+
draft: false
11+
---
12+
13+
## Why and What
14+
15+
Vector databases are really hot topics nowadays. I have always been curious about what they are and how they work under the hood, so let's build one ourselves. Building a whole new database from scratch is not practical, we need some building blocks, or, just a real database system. Postgres has a long-standing reputation for its extensibility, which makes it a perfect fit for our needs, and projects like [pgvector][pgvector] have already demonstrated it is viable to add vector support to Postgres as an extension.
16+
17+
We are going to implement vector support for Postgres, but, what are the detailed features to implement? This is not a hard question, the definition of [Vector database][vector_db_wikipedia] from Wikipedia shows us the right direction:
18+
19+
> A vector database, vector store or vector search engine is a database that can store vectors (fixed-length lists of numbers) along with other data items. Vector databases typically implement one or more Approximate Nearest Neighbor algorithms so that one can search the database with a query vector to retrieve the closest matching database records
20+
21+
Alright, so we need to enable Postgres to store vectors, and be able to perform Top-K queries, i.e., for a given input vector, Postgres should return the K vectors that are most similar (or closest) to it. If we express them in SQL, it would look like this:
22+
23+
```sql
24+
-- Create a table, which has a column of type `vector(3)`, 3 is the dimension of the vector
25+
CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
26+
27+
-- Insert vectors, Postgres should store them!
28+
INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
29+
30+
-- Now, Postgres should return the Top-5 vectors that are most similar to
31+
-- [3, 1, 2]
32+
SELECT * FROM items ORDER BY embedding <=> '[3,1,2]' LIMIT 5;
33+
```
34+
35+
Example SQL makes things clear, to summarize, we need to:
36+
37+
1. Implement a `vector` type for Postgres, it should accept a dimension parameter
38+
2. Implement that `<=>` binary operator, which should calculate the similarity of the given 2 vectors and return it
39+
40+
## Set up the environment
41+
42+
I will use the Rust language and a library called [`pgrx`][pgrx], to install Rust you simply need to follow the instructions [here][install_rust], then you run this command to set up `cargo-pgrx`, a cargo sub-command to manage everything related to `pgrx`:
43+
44+
```sh
45+
$ cargo install --locked cargo-pgrx
46+
$ cargo pgrx --version # to verify that it gets installed
47+
```
48+
49+
Now we need a Postgres server to run and test our project, I would just let `pgrx` install a brand new Postgres for me to make things easier. At the time of writing, [Postgres 17 is the latest version][pg17_release], so I will use it.
50+
51+
`pgrx` builds Postgres from source, so you need to ensure these [requirements][build_pg_requirements] are satisfied. `pgrx` also has a page about the [system requirements][pgrx_system_requiremens], but Postgres is really well-documented, it deserves a read. Once you have everything set up, run:
52+
53+
```sh
54+
$ cargo pgrx init --pg17 download
55+
```
56+
57+
## The initial commit
58+
59+
Now let's write some code, `cargo pgrx`, just like `cargo`, provides a `new` sub-command to create new projects, say we call our project `pg_vector_ext`, run:
60+
61+
```sh
62+
$ cargo pgrx new pg_vector_ext
63+
```
64+
65+
```sh
66+
$ cd pg_vector_ext
67+
$ tree .
68+
pg_vector_ext/
69+
├── Cargo.toml
70+
├── pg_vector_ext.control
71+
├── sql
72+
└── src
73+
├── bin
74+
│   └── pgrx_embed.rs
75+
└── lib.rs
76+
77+
4 directories, 4 files
78+
```
79+
80+
From this, we can see, `pgrx` creates some template files for us. For now, we only care about the `src/lib.rs` file.
81+
82+
```sh
83+
$ bat src/lib.rs
84+
───────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
85+
│ File: src/lib.rs
86+
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
87+
1 │ use pgrx::prelude::*;
88+
2 │
89+
3 │ ::pgrx::pg_module_magic!();
90+
4 │
91+
5 │ #[pg_extern]
92+
6 │ fn hello_pg_vector_ext() -> &'static str {
93+
7 │ "Hello, pg_vector_ext"
94+
8 │ }
95+
9 │
96+
10 │ #[cfg(any(test, feature = "pg_test"))]
97+
11 │ #[pg_schema]
98+
12 │ mod tests {
99+
13 │ use pgrx::prelude::*;
100+
14 │
101+
15 │ #[pg_test]
102+
16 │ fn test_hello_pg_vector_ext() {
103+
17 │ assert_eq!("Hello, pg_vector_ext", crate::hello_pg_vector_ext());
104+
18 │ }
105+
19 │
106+
20 │ }
107+
21 │
108+
22 │ /// This module is required by `cargo pgrx test` invocations.
109+
23 │ /// It must be visible at the root of your extension crate.
110+
24 │ #[cfg(test)]
111+
25 │ pub mod pg_test {
112+
26 │ pub fn setup(_options: Vec<&str>) {
113+
27 │ // perform one-off initialization when the pg_test framework starts
114+
28 │ }
115+
29 │
116+
30 │ #[must_use]
117+
31 │ pub fn postgresql_conf_options() -> Vec<&'static str> {
118+
32 │ // return any postgresql.conf settings that are required for your tests
119+
33 │ vec![]
120+
34 │ }
121+
35 │ }
122+
───────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
123+
```
124+
125+
Ignore the `tests` module (as it is for testing), we can see that `pgrx` creates a function `hello_pg_vector_ext()`, this is something callable in SQL, if we run the project via:
126+
127+
> Before running it, you need to edit the `Cargo.toml` file, within the `features` section, change the default feature to `pg17`, and optionally, you can remove the `pg*` features other than `pg17` as they won't be used:
128+
>
129+
> ```toml
130+
> [features]
131+
> default = ["pg17"]
132+
> pg17 = ["pgrx/pg17", "pgrx-tests/pg17" ]
133+
> pg_test = []
134+
> ```
135+
136+
```sh
137+
$ cargo pgrx run
138+
```
139+
140+
It will start the Postgres 17 instance and connect to it via `psql`, we can install our extension and run the function:
141+
142+
```sql
143+
pg_vector_ext=# CREATE EXTENSION pg_vector_ext;
144+
CREATE EXTENSION
145+
pg_vector_ext=# SELECT hello_pg_vector_ext();
146+
hello_pg_vector_ext
147+
----------------------
148+
Hello, pg_vector_ext
149+
(1 row)
150+
```
151+
152+
This is our first attempt at `pgrx` and also our first commit to the project. In the next post, I will implement the `vector` type so that Postgres can store vectors.
153+
154+
[pgvector]: https://github.com/pgvector/pgvector
155+
[vector_db_wikipedia]: https://en.wikipedia.org/wiki/Vector_database
156+
[pgrx]: https://github.com/pgcentralfoundation/pgrx
157+
[install_rust]: https://www.rust-lang.org/tools/install
158+
[pg17_release]: https://www.postgresql.org/about/news/postgresql-17-released-2936/
159+
[build_pg_requirements]: https://www.postgresql.org/docs/current/install-requirements.html
160+
[pgrx_system_requiremens]: https://github.com/pgcentralfoundation/pgrx/?tab=readme-ov-file#system-requirements

0 commit comments

Comments
 (0)