Database

pgvector: Embeddings and vector similarity


pgvector is a PostgreSQL extension for vector similarity search. It can also be used for storing embeddings.

Learn more about Supabase's AI & Vector offering.

Concepts

Vector similarity

Vector similarity refers to a measure of the similarity between two related items. For example, if you have a list of products, you can use vector similarity to find similar products. To do this, you need to convert each product into a "vector" of numbers, using a mathematical model. You can use a similar model for text, images, and other types of data. Once all of these vectors are stored in the database, you can use vector similarity to find similar items.

Embeddings

This is particularly useful if you're building on top of OpenAI's GPT-3. You can create and store embeddings for retrieval augmented generation.

Usage

Enable the extension

  1. Go to the Database page in the Dashboard.
  2. Click on Extensions in the sidebar.
  3. Search for "vector" and enable the extension.

Usage

Create a table to store vectors


_10
create table posts (
_10
id serial primary key,
_10
title text not null,
_10
body text not null,
_10
embedding vector(384)
_10
);

Storing a vector / embedding

In this example we'll generate a vector using Transformer.js, then store it in the database using the Supabase client.


_21
import { pipeline } from '@xenova/transformers'
_21
const generateEmbedding = await pipeline('feature-extraction', 'Supabase/gte-small')
_21
_21
const title = 'First post!'
_21
const body = 'Hello world!'
_21
_21
// Generate a vector using Transformers.js
_21
const output = await generateEmbedding(body, {
_21
pooling: 'mean',
_21
normalize: true,
_21
})
_21
_21
// Extract the embedding output
_21
const embedding = Array.from(output.data)
_21
_21
// Store the vector in Postgres
_21
const { data, error } = await supabase.from('posts').insert({
_21
title,
_21
body,
_21
embedding,
_21
})

Specific usage cases

Queries with filtering

If you use an IVFFlat or HNSW index and naively filter the results based on the value of another column, you may get fewer rows returned than requested.

For example, the following query may return fewer than 5 rows, even if 5 corresponding rows exist in the database. This is because the embedding index may not return 5 rows matching the filter.


_10
SELECT * FROM items WHERE category_id = 123 ORDER BY embedding <-> '[3,1,2]' LIMIT 5;

To get the exact number of requested rows, use iterative search to continue scanning the index until enough results are found.

More pgvector and Supabase resources