Prepare for the PgBouncer and IPv4 deprecations on 26th January 2024

Learn more

Implementing "seen by" functionality with Postgres

2022-07-18

33 minute read

tl;dr: Use HyperLogLog, it's a reasonable approach with great trade-offs and no large architectural liabilities. For a quick & dirty prototype, use hstore, which also performs the best with integer IDs.

The year is 2022. You're head DBA at the hot new social site, SupaBook... Your startup is seeing eye-boggling growth because everyone loves fitting their hot-takes in posts restricted to VARCHAR(256).

Why VARCHAR(256)? No particular reason, but you don't have time to get hung up on that or ask why -- you just found out that the priority this quarter is tracking content views across all posts in the app.

"It sounds pretty simple" a colleague at the meeting remarks -- "just an increment here and an increment there and we'll know which posts are seen the most on our platform". You start to explain why it will be non-trivial, but the meeting ends before you can finish.

Well, it's time to figure out how you're going to do it. There's been a complexity freeze at the company, so you're not allowed to bring in any new technology, but you don't mind that because for v1 you would have picked Postgres anyway. Postgres's open source pedigree, robust suite of features, stable internals, and awesome mascot Slonik make it a strong choice, and it's what you're already running.

(insert record scratch here)

Sure, this scenario isn't real, but it could be - that last part about Postgres definitely is. Let's see how you might solve this problem, as that imaginary DBA.

Experiment setup

We've got the following simple table layout:

basic table layout diagram

In SQL migration form:

hideCopy

_38
CREATE EXTENSION IF NOT EXISTS uuid-ossp;
_38
CREATE EXTENSION IF NOT EXISTS citext;
_38
_38
-- Create a email domain to represent and constraing email addresses
_38
CREATE DOMAIN email
_38
AS citext
_38
CHECK ( LENGTH(VALUE) <= 255 AND value ~ '^[a-zA-Z0-9.!#$%&''*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$' );
_38
_38
COMMENT ON DOMAIN email is 'lightly validated email address';
_38
_38
-- Create the users table
_38
CREATE TABLE users (
_38
id bigserial PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
_38
uuid uuid NOT NULL DEFAULT uuid_nonmc_v1(),
_38
_38
email email NOT NULL,
_38
name text,
_38
about_html text,
_38
_38
created_at timestamptz NOT NULL DEFAULT NOW()
_38
);
_38
_38
-- Create the posts table
_38
CREATE TABLE posts (
_38
id bigserial PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
_38
uuid uuid NOT NULL DEFAULT uuid_nonmc_v1(),
_38
_38
title text,
_38
content text,
_38
main_image_src text,
_38
main_link_src text,
_38
_38
created_by bigint REFERENCES users(id),
_38
_38
last_hidden_at timestamptz,
_38
last_updated_at timestamptz,
_38
created_at timestamptz NOT NULL DEFAULT NOW()
_38
);

This basic setup has taken the (imaginary) company quite far -- even though the posts table has millions and millions of entries, Postgres chugs along and serves our queries with impressive speed and reliability. Scaling up is the new (and old) scaling out.

How should we do it?

Well we can't pat ourselves for our miraculous and suspiciously simple DB architecture all day, let's move on to the task at hand.

Like any good tinkerer we'll start with the simplest solutions and work our way up in complexity to try and get to something outstanding, testing our numbers as we go.

Try #1: The naive way, a simple counter on every Post

The easiest obvious way to do this is to maintain a counter on every tuple in the posts table. It's obvious, and it's almost guaranteed to work -- but maybe not work well.

The migration to make it happen isn't too difficult:

hideCopy

_10
BEGIN;
_10
_10
ALTER TABLE posts ADD COLUMN seen_by_count;
_10
_10
COMMENT ON COLUMN posts.seen_by_count
_10
IS 'simple count of users who have seen the post';
_10
_10
COMMIT;

There's one obvious glaring issue here -- what if someone sees the same post twice? Every page reload would cause inflated counts in the seen_by_count column, not to mention a lot of concurrent database updates (which isn't necessarily Postgres's forte to begin with).

Clearly there's a better way to do things but before that...

Writing a test suite before the CPUs get hot and heavy

How will we know which approach is better without numbers?! Measuring complexity and feeling can only get us so far -- we need to get some numbers that tell us the performance of the solution at the stated tasks -- we need benchmarks.

Before we can declare any solution the best, in particular we need a baseline!. The simplest possible incorrect solution (simply incrementing a counter on the Post) is probably a reasonable thing to use as a benchmark, so let's take a moment to write our testing suite.

Let's do this the simplest one might imagine:

  • Generate a large amount of users
    • Lets model for 1000, 10k, 100K, 1MM, and 10MM users
  • Generate an even larger amount of fake posts attributed to those users
  • Generate a description of "events" that describe which post was seen by whom, which we can replay.
    • We want the equivalent of an effect system or monadic computation, which is easier than it sounds -- we want to generate an encoding (JSON, probably) of what to do, without actually doing it
    • We'll just do consistent "as fast as we can" execution (more complicated analysis would burst traffic to be ab it closer to real life)

OK, let's roll our hands up and get it done:

Script: User seeding

Here's what that looks like:


_32
/**
_32
* Generate a list of synthetic users to be loaded into Postgres
_32
*
_32
* @param {object} args
_32
* @param {number} [args.count] number of users to generate
_32
* @param {number} [args.aboutHTMLWordCount] number of words to generate (lorem ipsum) for about_html (serves to add heft to tuples)
_32
* @param {string} [args.outputFilePath] output file path, if present this functoin returns void
_32
* @returns {any[][]} List of generated synthetic users
_32
*/
_32
export async function generateUsers(args) {
_32
const count = args.count || DEFAULT_USER_COUNT
_32
const aboutHTMLWordCount = args.aboutHTMLWordCount || DEFAULT_ABOUT_HTML_WORD_COUNT
_32
_32
const outputFilePath = args.outputFilePath
_32
if (!outputFilePath) {
_32
throw new Error('output file path must be specified')
_32
}
_32
_32
for (var id = 0; id < count; id++) {
_32
const user = {
_32
id,
_32
email: `user${id}@example.com`,
_32
name: `user ${id}`,
_32
about_html: fastLoremIpsum(aboutHTMLWordCount, 'w'),
_32
}
_32
_32
// Write the entries to disk (returning nothing)
_32
if (args.outputFilePath) {
_32
await appendFile(outputFilePath, `${JSON.stringify(user)}\n`)
_32
}
_32
}
_32
}

Nothing too crazy in there -- we generate a bunch of JSON, and force it out to disk. It's best to avoid trying to keep it in memory so we can handle much larger volumes than we might be able to fit in memory.

If you'd like to see the code, check out scripts/generate/users.js in the repo.

Script: Post seeding

Along with users, we need to generate posts that they can view. We'll keep it simple and take an amount of posts to make, generating from 0 to count of those.

It's very similar to the user generation code, with the caveat that we can take into account the 80/20 lurker/poster rule. here's what that looks like:

It's a bit long so if you'd like to see the code, check out scripts/generate/posts.js in the repo.

Script: action (API call) seeding/generation

This script is a bit tricky -- we need to inject some randomness in the performing of the following actions:

  • Record a new view of a post
  • Retrieve just the count of a single post
  • Retrieve all the users who saw a post

I've chosen to use autocannon so I needed to write a request generation script which looks like this:


_48
const process = require('process')
_48
_48
const POST_COUNT = process.env.TEST_POST_COUNT
_48
? parseInt(process.env.TEST_POST_COUNT, 10)
_48
: undefined
_48
const USER_COUNT = process.env.TEST_USER_COUNT
_48
? parseInt(process.env.TEST_USER_COUNT, 10)
_48
: undefined
_48
_48
/**
_48
* Request setup function for use with autocannon
_48
*
_48
* @param {Request} request
_48
* @returns {Request}
_48
*/
_48
function setupRequest(request) {
_48
// ENsure we have counts to go off of
_48
if (!POST_COUNT || !USER_COUNT) {
_48
throw new Error('Cannot setup request without valid post/user count!')
_48
}
_48
_48
// Pick a random post to do an operation on
_48
const postId = Math.floor(Math.random() * POST_COUNT)
_48
_48
// Choose pseudo-randomly whether to register a seen by or read seenby status
_48
const operationChoice = Math.floor(Math.random() * 10)
_48
if (operationChoice < 1) {
_48
// 10% of the time, get *all* the users
_48
request.method = 'GET'
_48
request.path = `/posts/${postId}/seen-by/users`
_48
} else if (operationChoice < 7) {
_48
// 60% of the time, get the count of seenby on a post
_48
request.method = 'GET'
_48
request.path = `/posts/${postId}/seen-by/count`
_48
} else {
_48
// 30% of the time, add a new seen-by entry
_48
const userId = Math.floor(Math.random() * USER_COUNT)
_48
_48
// Most of the time we'll be *setting* seen-by
_48
// And we'll get the count (so we can show it) later as well
_48
request.method = 'POST'
_48
request.path = `/posts/${postId}/seen-by/${userId}`
_48
}
_48
_48
return request
_48
}
_48
_48
module.exports = setupRequest

Nothing too crazy here, and some back of the envelope estimations on how often each operation would normally be called. These numbers could be tweaked more, but we should see a difference between approaches even if we messed up massively here.

If you'd like to see the code, check out scripts/setup-request.cjs in the repo.

Glue it all together

Once we're done we need to glue this all together into one script, with roughly this format:


_10
export default async function runBenchmark() {
_10
// Start the server
_10
// Reset before test
_10
// Generate & insert users
_10
// Generate & insert posts
_10
// Generate actions (API Calls) to run
_10
// Execute the API calls
_10
// Write JSON results to tmpdir
_10
// Stop the server
_10
}

If you want to see what the code actually ended up looking like, check out scripts/bench.js in the repo.

Along with the benchmark, we'll standardize on the following settings:


_11
export SEEN_BY_STRATEGY=simple-counter # or: simple-hstore, assoc-table, hll
_11
export TEST_USERS_JSON_PATH=/tmp/supabase-seen-by.users.json
_11
export TEST_POSTS_JSON_PATH=/tmp/supabase-seen-by.posts.json
_11
export TEST_POST_COUNT=1000
_11
export TEST_USER_COUNT=100000
_11
export TEST_DURATION_SECONDS=60
_11
_11
## Use custom postgres image built with hll extension (https://github.com/citusdata/postgresql-hll)
_11
## NOTE: `make db-custom-image` must be run beforehand
_11
#export DB_IMAGE=postgres-14.4-alpine-hll
_11
#export DB_IMAGE_TAG=latest

Our first run, on the naive solution

Alright, finally we're ready. Let's see what we get on our naive solution. We expect this to be pretty fast, because not only is it wrong, but it's just about the simplest thing you could do.

On my local machine, here's our baseline (output from autocannon):


_51
┌─────────┬──────┬──────┬───────┬──────┬─────────┬─────────┬───────┐
_51
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
_51
├─────────┼──────┼──────┼───────┼──────┼─────────┼─────────┼───────┤
_51
│ Latency │ 0 ms │ 2 ms │ 6 ms │ 6 ms │ 2.03 ms │ 1.82 ms │ 23 ms │
_51
└─────────┴──────┴──────┴───────┴──────┴─────────┴─────────┴───────┘
_51
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
_51
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
_51
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
_51
│ Req/Sec │ 297 │ 318 │ 389 │ 500 │ 391.24 │ 47.87 │ 297 │
_51
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
_51
│ Bytes/Sec │ 54.1 kB │ 57.9 kB │ 70.8 kB │ 91.1 kB │ 71.3 kB │ 8.72 kB │ 54.1 kB │
_51
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘
_51
_51
Req/Bytes counts sampled once per second.
_51
# of samples: 60
_51
_51
┌────────────┬──────────────┐
_51
│ Percentile │ Latency (ms) │
_51
├────────────┼──────────────┤
_51
│ 0.001 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 0.01 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 0.1 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 1 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 2.5 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 10 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 25 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 50 │ 2 │
_51
├────────────┼──────────────┤
_51
│ 75 │ 3 │
_51
├────────────┼──────────────┤
_51
│ 90 │ 5 │
_51
├────────────┼──────────────┤
_51
│ 97.5 │ 6 │
_51
├────────────┼──────────────┤
_51
│ 99 │ 6 │
_51
├────────────┼──────────────┤
_51
│ 99.9 │ 9 │
_51
├────────────┼──────────────┤
_51
│ 99.99 │ 16 │
_51
├────────────┼──────────────┤
_51
│ 99.999 │ 23 │
_51
└────────────┴──────────────┘
_51
_51
23k requests in 60.02s, 4.28 MB read

As you might imagine, pretty darn good latency across all the requests.

Back to trying things out

Now that we've got a basic baseline of our tests, let's continue trying out ideas:

Try #2: Storing the users who did the "see"ing, with hstore

The next obvious thing (and probably a core requirement if we'd asked around), is knowing who viewed each post. Well if we need to know who, then we probably need to store some more information!

Postgres has native support for arrays and a data structure called a hstore, so let's try those. It's pretty obvious that having hundreds, thousands, or millions of entries in one of these data structures, inside a tuple isn't the greatest idea, but let's try it anyway and let the numbers speak for themselves.

Here's what the migration would look like:

hideCopy

_11
BEGIN;
_11
_11
CREATE EXTENSION IF NOT EXISTS hstore;
_11
_11
ALTER TABLE posts ADD COLUMN seen_count_hstore hstore
_11
NOT NULL DEFAULT ''::hstore;
_11
_11
COMMENT ON COLUMN posts.seen_count_hstore
_11
IS 'count of users that have seen the post, with hstore';
_11
_11
COMMIT;

hstore provides support for both GIST and GIN indices, but after reading the documentation we can conclude that we don't necessarily need those for the current set of functionality.

Caveats

Well as you might have imagined, this is obviously pretty bad and will eventually be hard to scale. If you expect only 0-50 entries in your column text[] is perfectly fine, but thousands or millions is another ballgame.

Thinking of how to scale this, a few ideas pop to mind:

Performance

OK, time to get on with it, let's see how it performs with an hstore:


_51
┌─────────┬──────┬──────┬───────┬──────┬─────────┬─────────┬───────┐
_51
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
_51
├─────────┼──────┼──────┼───────┼──────┼─────────┼─────────┼───────┤
_51
│ Latency │ 0 ms │ 2 ms │ 5 ms │ 6 ms │ 2.15 ms │ 1.67 ms │ 16 ms │
_51
└─────────┴──────┴──────┴───────┴──────┴─────────┴─────────┴───────┘
_51
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
_51
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
_51
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
_51
│ Req/Sec │ 287 │ 305 │ 348 │ 504 │ 369.12 │ 58.8 │ 287 │
_51
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
_51
│ Bytes/Sec │ 53.9 kB │ 56.9 kB │ 64.5 kB │ 92.5 kB │ 68.3 kB │ 10.7 kB │ 53.8 kB │
_51
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘
_51
_51
Req/Bytes counts sampled once per second.
_51
# of samples: 60
_51
_51
┌────────────┬──────────────┐
_51
│ Percentile │ Latency (ms) │
_51
├────────────┼──────────────┤
_51
│ 0.001 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 0.01 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 0.1 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 1 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 2.5 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 10 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 25 │ 1 │
_51
├────────────┼──────────────┤
_51
│ 50 │ 2 │
_51
├────────────┼──────────────┤
_51
│ 75 │ 3 │
_51
├────────────┼──────────────┤
_51
│ 90 │ 5 │
_51
├────────────┼──────────────┤
_51
│ 97.5 │ 5 │
_51
├────────────┼──────────────┤
_51
│ 99 │ 6 │
_51
├────────────┼──────────────┤
_51
│ 99.9 │ 9 │
_51
├────────────┼──────────────┤
_51
│ 99.99 │ 9 │
_51
├────────────┼──────────────┤
_51
│ 99.999 │ 16 │
_51
└────────────┴──────────────┘
_51
_51
22k requests in 60.02s, 4.1 MB read

Not too far off! While we didn't try the pathological case(s) of millions of people liking the same post to hit breaking point, a slightly more random distribution seems to have done decently -- we actually have lower 99.999th percentile latency versus the simple counter.

An average of 2.15ms versus 2.05ms with the simpler counter is a ~4% increase in the average latency (though of course, the p99.999 is lower!).

Try #3: An Association table for remembering who liked what

A likely requirement from the original scenario that we've completely ignored is remembering which users liked a certain post to. The easiest solution here is an "associative" table like this one:

tables with associative table

In SQL:

hideCopy

_10
begin;
_10
_10
create table posts_seen_by_users (
_10
post_id bigint references posts (id),
_10
user_id bigint references users (id),
_10
seen_count bigint not null default 0 check (seen_count > 0),
_10
primary key (post_id, user_id)
_10
);
_10
_10
commit;

Caveats

In production, you're going to want to do a few things to make this even remotely reasonable long term:

  • PARTITION the table (consider using partition-friendly pg_partman)
  • Move old partitions off to slower/colder storage and maintain snapshots
  • Summarize older content that might be seen lots
  • Consider a partitioning key up front -- post IDs are probably a reasonable thing to use if they're sufficiently randomly distributed

These are good initial stop-gaps, but a realistic setup will have many problems and many more solutions to be discovered.

(It will be a recurring theme but this is a spot where we probably don't necessarily want to use stock Postgres but instead want to use tools like Citus Columnar Storage, ZedStore, or an external choice like ClickHouse).

Performance

Alright, enough dilly dally, let's run our test bench against this setup:


_51
┌─────────┬──────┬──────┬───────┬──────┬────────┬─────────┬───────┐
_51
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
_51
├─────────┼──────┼──────┼───────┼──────┼────────┼─────────┼───────┤
_51
│ Latency │ 0 ms │ 2 ms │ 8 ms │ 8 ms │ 2.5 ms │ 2.45 ms │ 30 ms │
_51
└─────────┴──────┴──────┴───────┴──────┴────────┴─────────┴───────┘
_51
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
_51
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
_51
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
_51
│ Req/Sec │ 238 │ 254 │ 321 │ 464 │ 326.52 │ 48.14 │ 238 │
_51
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
_51
│ Bytes/Sec │ 43.4 kB │ 46.3 kB │ 58.5 kB │ 84.5 kB │ 59.5 kB │ 8.77 kB │ 43.3 kB │
_51
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘
_51
_51
Req/Bytes counts sampled once per second.
_51
# of samples: 60
_51
_51
┌────────────┬──────────────┐
_51
│ Percentile │ Latency (ms) │
_51
├────────────┼──────────────┤
_51
│ 0.001 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 0.01 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 0.1 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 1 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 2.5 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 10 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 25 │ 1 │
_51
├────────────┼──────────────┤
_51
│ 50 │ 2 │
_51
├────────────┼──────────────┤
_51
│ 75 │ 4 │
_51
├────────────┼──────────────┤
_51
│ 90 │ 7 │
_51
├────────────┼──────────────┤
_51
│ 97.5 │ 8 │
_51
├────────────┼──────────────┤
_51
│ 99 │ 8 │
_51
├────────────┼──────────────┤
_51
│ 99.9 │ 11 │
_51
├────────────┼──────────────┤
_51
│ 99.99 │ 25 │
_51
├────────────┼──────────────┤
_51
│ 99.999 │ 30 │
_51
└────────────┴──────────────┘
_51
_51
20k requests in 60.02s, 3.57 MB read

A little bit more divergence here -- 99.999%ile latency @ 30 which is almost double what it was for simple-hstore.

Average is coming in at 2.50ms which is 16% slower than simple-hstore and 21% slower than simple-counter.

Try #4: Getting a bit more serious: bringing out the HyperLogLog

rest of the owl

We'll just draw the rest of the owl now.

What's HyperLogLog you ask? Well it's just a probabilistic data structure! Don't worry if you've never heard of it before, it's a reasonably advanced concept.

You may have heard of Bloom Filters and they're somewhat related but they're not quite a great fit for the problem we're solving since we want to know how many people have seen a particular post. Knowing whether one user has seen a particular post is useful too -- but not quite what we're solving for here (and we'd have to double-check our false positives anyway if we wanted to be absolutely sure).

HyperLogLog provides a probabilistic data structure that is good at counting distinct entries, so that means that the count will not be exact, but be reasonably close (depending on how we tune). We won't have false positives (like with a bloom filter) -- we'll have a degree of error (i.e. the actual count may be 1000, but the HLL reports 1004).

We have to take this into account on the UI side but and maybe retrieve the full count if anyone ever really needs to know/view individual users that have seen the content, so we can fall back to our association table there.

Given that every second there are about 6000 tweets on Twitter(!), this is probably one of the only solutions that could actually work at massive scale with the limitations we've placed on ourselves.

Here's what that looks like in SQL:

hideCopy

_11
BEGIN;
_11
_11
CREATE EXTENSION IF NOT EXISTS hll;
_11
_11
ALTER TABLE posts ADD COLUMN seen_count_hll hll
_11
NOT NULL DEFAULT hll_empty();
_11
_11
COMMENT ON COLUMN posts.seen_count_hll
_11
IS 'HyperLogLog storing user IDs';
_11
_11
COMMIT;

Here we need the citus/postgresql-hll extension, which is generously made (truly) open source by citusdata.

NOTE that we still have access to the association table -- and while we still insert rows into it, we can drop the primary key index, and simply update our HLL (and leave ourselves a note on when we last updated it).

Caveats

There's not much to add to this solution, as the heavy lifting is mostly done by postgresql-hll, but there's one big caveat:

  • This approach will need a custom Postgres image for this, since hll is not an official contrib module

There are also a few optimizations that are easy to imagine:

  • Batching inserts to the association table (storing them in some other medium in the meantime -- local disk, redis, etc)
  • Writing our association table entries in a completely different storage medium altogether (like object storage) and use Foreign Data Wrappers and pg_cron and delay or put off processing all together

Performance

The most complicated solution by far, let's see how it fares:


_51
┌─────────┬──────┬──────┬───────┬──────┬─────────┬─────────┬───────┐
_51
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
_51
├─────────┼──────┼──────┼───────┼──────┼─────────┼─────────┼───────┤
_51
│ Latency │ 0 ms │ 2 ms │ 6 ms │ 6 ms │ 2.28 ms │ 2.03 ms │ 59 ms │
_51
└─────────┴──────┴──────┴───────┴──────┴─────────┴─────────┴───────┘
_51
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
_51
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
_51
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
_51
│ Req/Sec │ 272 │ 285 │ 351 │ 456 │ 353.05 │ 45.13 │ 272 │
_51
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
_51
│ Bytes/Sec │ 49.5 kB │ 51.9 kB │ 63.9 kB │ 83.1 kB │ 64.3 kB │ 8.22 kB │ 49.5 kB │
_51
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘
_51
_51
Req/Bytes counts sampled once per second.
_51
# of samples: 60
_51
_51
┌────────────┬──────────────┐
_51
│ Percentile │ Latency (ms) │
_51
├────────────┼──────────────┤
_51
│ 0.001 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 0.01 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 0.1 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 1 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 2.5 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 10 │ 0 │
_51
├────────────┼──────────────┤
_51
│ 25 │ 1 │
_51
├────────────┼──────────────┤
_51
│ 50 │ 2 │
_51
├────────────┼──────────────┤
_51
│ 75 │ 4 │
_51
├────────────┼──────────────┤
_51
│ 90 │ 6 │
_51
├────────────┼──────────────┤
_51
│ 97.5 │ 6 │
_51
├────────────┼──────────────┤
_51
│ 99 │ 6 │
_51
├────────────┼──────────────┤
_51
│ 99.9 │ 9 │
_51
├────────────┼──────────────┤
_51
│ 99.99 │ 28 │
_51
├────────────┼──────────────┤
_51
│ 99.999 │ 59 │
_51
└────────────┴──────────────┘
_51
_51
21k requests in 60.03s, 3.86 MB read

Another somewhat nuanced degradation in performance -- while the 99.99%ile latency was nearly 2x higher, the average latency was actually lower than the assoc-table approach @ 2.28ms.

The average latency on the HLL approach is 11% worse than simple-counter, 6% worse than simple-hstore, and faster than assoc-table alone, which is an improvement.

Oh, the other places we could go

One of the great things about Postgres is it's expansive ecosystem -- while Postgres may (and frankly should not) beat the perfect specialist tool for your use case, it often does an outstanding job in the general case.

Let's look into some more experiments that could be run -- maybe one day in the future we'll get some numbers behind these (community contributions are welcome!).

Incremental view maintenance powered by pg_ivm

If you haven't heard about pg_ivm it's an extension for handling Incremental View Maintenance -- updating VIEWs when underlying tables change.

IVM is a hotly requested feature whenever views (particularly materialized views) are mentioned, so there has been much fanfare to it's release.

There are a couple advantages we could gain by using pg_ivm:

  • Ability to time constrain calculations (newer posts which are more likely to be seen can exist in instant-access views)
  • We could theoretically remove the complicated nature of the HLL all together by using COUNT with IVM

pg_ivm is quite new and cutting edge but looks to be a great solution -- it's worth giving a shot someday.

Doing graph computations with AGE

As is usually the case in academia and practice, we can make our problem drastically easier by simply changing the data structures we use to model our problem!

One such reconfiguration would be storing the information as a graph:

graph of seen by relation

As you might imagine, finding the number of "seen-by" relations would simply be counting the number of edges out of one of the nodes!

Well, the Postgres ecosystem has us covered here too! AGE is an extension that allows you to perform graph related queries in Postgres.

We won't pursue it in this post but it would be a great way to model this problem as well -- thanks to the extensibility of Postgres, this data could live right next to our normal relational data as well.

So what's the best way to do it?

OK, so what's the answer at the end of the day? What's the best way to get to that useful v1? Here are the numbers:

latency graph showing simple-hstore,hll,hll,and assoc table in speed order

In tabular form:

ApproachAvg (ms)99%ile (ms)99.999%ile (ms)
simple-counter2.03623
simple-hstore2.15616
assoc-table2.5830
hll2.16727

If we go strictly with the data, the best way looks to be the hstore-powered solution, but I think the HLL is probably the right choice.

The HLL results were quite variable -- some runs were faster than others, so I've taken the best of 3 runs.

Even though the data says hstore, knowing that posts will be seen by more and more people over time, I might choose the HLL solution for an actual implementation. It's far less likely to pose a bloated row problem, and it has the absolute correctness (and later recall) of the assoc-table solution, while performing better over all (as you can imagine, no need to COUNT rows).

Another benefit of the HLL solution is that PostgreSQL tablespaces allow us to put the association table on a different, slower storage mechanism, and keep our posts table fast. Arguably in a real system we might have the HLL in something like redis but for a v1, it looks like Postgres does quite well!

Wrap-up

I hope you enjoyed this look down the trunk hole, and you've got an idea of how to implement solutions to surprisingly complex problems like this one with Postgres.

As usual, Postgres has the tools to solve the problem reasonably well (if not completely) before you reach out for more complicated/standalone solutions.

See any problems with the code, solutions that haven't been tried? -- reach out, or open an issue!

More Postgres resources

Share this article

Build in a weekend, scale to millions