supabase / supavisor Goto Github PK

View Code? Open in Web Editor NEW

1.6K 29.0 45.0 3.43 MB

A cloud-native, multi-tenant Postgres connection pooler.

Home Page: https://supabase.github.io/supavisor/

License: Apache License 2.0

Elixir 97.35% Makefile 0.80% HTML 0.53% Dockerfile 0.79% Shell 0.30% Batchfile 0.03% Rust 0.20%

elixir erlang postgresql

supavisor's Introduction

Supavisor - Postgres connection pooler

Overview
Motivation
Architecture
Docs
Features
Future work
Acknowledgements
Benchmarks
Inspiration

Overview

Supavisor is a scalable, cloud-native Postgres connection pooler. A Supavisor cluster is capable of proxying millions of Postgres end-client connections into a stateful pool of native Postgres database connections.

For database managers, Supavisor simplifies the task of managing Postgres clusters by providing easy configuration of highly available Postgres clusters (todo).

Motivation

We have several goals with Supavisor:

Zero-downtime scaling: we want to scale Postgres server compute with zero-downtime. To do this, we need an external Pooler that can buffer and re-route requests while the resizing operation is in progress.
Handling modern connection demands: We need a Pooler that can absorb millions of connections. We often see developers connecting to Postgres from Serverless environments, and so we also need something that works with both TCP and HTTP protocols.
Efficiency: Our customers pay for database processing power, and our goal is to maximize their database capacity. While PgBouncer is resource-efficient, it still consumes some resources on the database instance. By moving connection pooling to a dedicated cluster adjacent to tenant databases, we can free up additional resources to better serve customer queries.

Architecture

Supavisor was designed to work in a cloud computing environment as a highly available cluster of nodes. Tenant configuration is stored in a highly available Postgres database. Configuration is loaded from the Supavisor database when a tenant connection pool is initiated.

Connection pools are dynamic. When a tenant client connects to the Supavisor cluster the tenant pool is started and all connections to the tenant database are established. The process ID of the new tenant pool is then distributed to all nodes of the cluster and stored in an in-memory key-value store. Subsequent tenant client connections live on the inbound node but connection data is proxied from the pool node to the client connection node as needed.

Because the count of Postgres connections is constrained only one tenant connection pool should be alive in a Supavisor cluster. In the case of two simultaneous client connections starting a pool, as the pool process IDs are distributed across the cluster, eventually one of those pools is gracefully shutdown.

The dynamic nature of tenant database connection pools enables high availability in the event of node outages. Pool processes are monitored by each node. If a node goes down that process ID is removed from the cluster. Tenant clients will then start a new pool automatically as they reconnect to the cluster.

This design enables blue-green or rolling deployments as upgrades require. A single VPC / multiple availability zone topologies is possible and can provide for greater redundancy when load balancing queries across read replicas are supported (todo).

Docs

Features

Fast
- Within 90% throughput as compared to PgBouncer running pgbench locally
Scalable
- 1 million Postgres connections on a cluster
- 250_000 idle connections on a single 16 core node with 64GB of ram
Multi-tenant
- Connect to multiple different Postgres instances/clusters
Single-tenant
- Easy drop-in replacement for PgBouncer
Pool mode support per tenant
- Transaction
Cloud-native
- Cluster-able
- Resilient during cluster resizing
- Supports rolling and blue/green deployment strategies
- NOT run in a serverless environment
- NOT dependant on Kubernetes
Observable
- Easily understand throughput by tenant, tenant database or individual connection
- Prometheus /metrics endpoint
Manageable
- OpenAPI spec at /api/openapi
- SwaggerUI at /swaggerui
Highly available
- When deployed as a Supavisor cluster and a node dies connection pools should be quickly spun up or already available on other nodes when clients reconnect
Connection buffering
- Brief connection buffering for transparent database restarts or failovers

Future Work

Load balancing
- Queries can be load balanced across read-replicas
- Load balancing is independant of Postgres high-availability management (see below)
Query caching
- Query results are optionally cached in the pool cluster and returned before hitting the tenant database
Session pooling
- Like PgBouncer
Multi-protocol Postgres query interface
- Postgres binary
- HTTPS
- Websocket
Postgres high-availability management
- Primary database election on primary failure
- Health checks
- Push button read-replica configuration
Config as code
- Not only for the supavisor cluster but tenant databases and tenant database clusters as well
- Pulumi / terraform support

Benchmarks

Local Benchmarks

Running pgbench on PgBouncer (transaction mode/pool size 60)

PGPASSWORD=postgres pgbench -M extended --transactions 100 --jobs 10 --client 100 -h localhost -p 6452 -U postgres postgres
pgbench (15.2, server 14.6 (Debian 14.6-1.pgdg110+1))
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: extended
number of clients: 100
number of threads: 10
maximum number of tries: 1
number of transactions per client: 100
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 510.310 ms
initial connection time = 31.388 ms
tps = 195.959361 (without initial connection time)

Running pgbench on Supavisor (pool size 60, no logs)

PGPASSWORD=postgres pgbench -M extended --transactions 100 --jobs 10 --client 100 -h localhost -p 7654 -U postgres.localhost postgres
pgbench (15.2, server 14.6 (Debian 14.6-1.pgdg110+1))
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: extended
number of clients: 100
number of threads: 10
maximum number of tries: 1
number of transactions per client: 100
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 528.463 ms
initial connection time = 178.591 ms
tps = 189.228103 (without initial connection time)

Load Test

Supavisor two node cluster
- 64vCPU / 246RAM
- Ubuntu 22.04.2 aarch64
1_003_200 concurrent client connection
20_000+ QPS
400 tenant Postgres connection
select * from (values (1, 'one'), (2, 'two'), (3, 'three')) as t (num,letter);
~50% CPU utilization (pool owner node)
7.8G RAM usage

Acknowledgements

José Valim and the Dashbit team were incredibly helpful in informing the design decisions for Supavisor.

Inspiration

Commercial Inspiration

supavisor's People

Contributors

Stargazers

Watchers

supavisor's Issues

Support limiting for postgres-related statistics

Chore

Describe the chore

Many providers (like supabase) require database requests to be throttled or rejected based on certain thresholds. These are typically:

Max storage
Egress over a certain period
Total egress per request
CPU (once it gets high, perhaps some sort of throttling

The most important/easiest is the first: Max Storage. Supabase typically puts a database into read-only mode once it gets close to disk limits. This could be better handled in the Proxy - this is because often a user starts uploading terrabytes of data, and their database fills up too fast to issue any commands.

For the implementation, it could be good to ping the database statistics and store them with the tenant config or in some faster cache

Additional context

Monitoring statistics: https://www.postgresql.org/docs/current/monitoring-stats.html

Consolidate client and server ends of the protocol in a single module

Since you're developing both client and server ends of the protocol I'd consider having a single module that does encoding and decoding:

defmodule PgEdge.Protocol do
  def encode_startup_message(input)

  def decode_startup_message(input)
end

I think keeping the code collocated will make it easier to maintain and make sure it is correct. Another benefit is keeping "symmetry", ideally we'd have:

assert message == message |> encode_startup_message() |> decode_startup_message()

Since you are already using :pgo_protocol.encode_startup_message, I'd definitely continue doing so. I'd just wrap it with the extra function to ensure the interface I mentioned above. Another benefit is if you ever choose to use a different underlying library or hand-roll encoding, the rest of the system would be unaffected because they'd only know about Protocol.encode_startup_message function.

I think another reason it is fine to keep a single module is we won't have to implement the entirety of the pg wire protocol, just the bits that we need which are around establishing the connection (and transaction control if we choose to go down that path), it is fine if the rest of the protocol is opaque to our proxy and we just forward it as is without any decoding/re-encoding.

Originally posted by @wojtekmach in #4 (comment)

Send Postgres service information on connection start after authorization

add password support

Initial metrics

Network usage or all tenants
- Poll inet Erlang module periodically
Queries total etc
Pool stats
- Actual db pool size
- Connected clients
- Pool checkout queue time

Route lo list all tenants is raising error when there are tenants to list

Bug report

[ x ] I confirm this is a bug with Supabase, not with my own application.
[ x ] I confirm I have searched the Docs, GitHub Discussions, and Discord.

Describe the bug

Route to get tenants http://localhost:4000/api/tenants raises an error when there are tenants to list

To Reproduce

Steps to reproduce the behavior, please provide code snippets or a repository:

Launch the Supavisor application with make dev
Add a tenant

curl -X PUT   'http://localhost:4000/api/tenants/dev_tenant'   --header 'Accept: application/json'   --header 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJvbGUiOiJhbm9uIiwiaWF0IjoxNjQ1MTkyODI0LCJleHAiOjE5NjA3Njg4MjR9.M9jrxyvPLkUxWgOYSf5dNdJ8v_eRrq810ShFRT8N-6M'   --header 'Content-Type: application/json'   --data-raw '{
  "tenant": {
    "db_host": "localhost",
    "db_port": 6432,
    "db_database": "postgres",
    "users": [
      {
        "db_user": "postgres",
        "db_password": "postgres",
        "pool_size": 20,
        "mode_type": "transaction"
      }
    ]
  }
}'

List all tenants

curl -X 'GET'   'http://localhost:4000/api/tenants' --header 'Accept: application/json'   --header 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJvbGUiOiJhbm9uIiwiaWF0IjoxNjQ1MTkyODI0LCJleHAiOjE5NjA3Njg4MjR9.M9jrxyvPLkUxWgOYSf5dNdJ8v_eRrq810ShFRT8N-6M'   --header 'Content-Type: application/json'

The error will be raised right away

Expected behavior

No error should be raised and info on all tenants should be displayed

Screenshots

System information

OS: Ubuntu 22.04.1 LTS
Supavisor's last commit 443c46b

Additional context

N/A

Comment

If this is not a bug, please enlighten me on what i did wrong as i just followed the steps on Installation-and-Usage

Supavisor needs tenant's password to do the routing using its proxy

Bug report

I confirm this is a bug with Supabase, not with my own application.
I confirm I have searched the Docs, GitHub Discussions, and Discord.

Describe the bug

After registering a tenant with its credentials using the PUT API I was expecting to use the proxy passing the user prefix, the tenant id and the DB password of supavisor DB, then the supavisor would retrieve internally the credentials of the tenant and would do the expected routing for us.

Right now I'm needing to pass the user prefix, the tenant id and the password of the database of the tenant, this won't make sense because we already have this password saved previously, right?

To Reproduce

Setup a few tenants using the docker-compose.txt (change the extension to .yml);
Run supavisor and its database;
Register credentials for tenant1:

curl --location --request PUT 'localhost:4000/api/tenants/84f9b050-6e82-4fda-a904-24958a7d184d' \
--header 'accept: application/json' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpYXQiOjE2ODAxNjIxNTR9.U9orU6YYqXAtpF8uAiw6MS553tm4XxRzxOhz2IwDhpY' \
--data '{
  "tenant": {
    "db_database": "tenant1",
    "db_host": "localhost",
    "db_port": 5432,
    "users": [
      {
        "db_password": "tenant1",
        "db_user": "tenant1",
        "mode_type": "transaction",
        "pool_size": 10
      }
    ]
  }
}'

Try to access the proxy with the password of the supavisor DB:

psql postgresql://tenant1.84f9b050-6e82-4fda-a904-24958a7d184d:password@localhost:7654/tenant1

Will fail with:

psql: error: connection to server at "localhost" (::1), port 7654 failed: error received from server in SCRAM exchange: Invalid client signature

Try to access the proxy with the password of the tenant DB:

psql postgresql://tenant1.84f9b050-6e82-4fda-a904-24958a7d184d:tenant1@localhost:7654/tenant1

Will work

Expected behavior

Be able to connect passing only the password of supavisor's DB.

System information

OS: macOS & Linux
Version of supavisor: #96

Add suggestion to set FLY_APP_NAME if you're not running on Fly

Improve documentation

Link

https://github.com/supabase/supavisor/wiki/Installation-and-Usage

Describe the problem

On this page, it's assuming the deployment will be made using Fly, but during some tests locally using MIX_ENV=prod, we got the error Invalid node name! Please check your configuration when starting the application.

Describe the improvement

Clarify that if you're not using Fly, set the env FLY_APP_NAME (this var is automatically set if you're using Fly)env var;

Additional context

eenv.sh.eex

Full error due the missing of FLY_APP_NAME:

=INFO REPORT==== 23-May-2023::18:42:21.198353 ===
Invalid node name!
Please check your configuration

=SUPERVISOR REPORT==== 23-May-2023::18:42:21.198873 ===
    supervisor: {local,net_sup}
    errorContext: start_error
    reason: {'EXIT',nodistribution}
    offender: [{pid,undefined},
               {id,net_kernel},
               {mfargs,{net_kernel,start_link,
                                   [#{clean_halt => true,name => '@127.0.0.1',
                                      name_domain => longnames,
                                      supervisor => net_sup}]}},
               {restart_type,permanent},
               {significant,false},
               {shutdown,2000},
               {child_type,worker}]

=CRASH REPORT==== 23-May-2023::18:42:21.200453 ===
  crasher:
    initial call: net_kernel:init/1
    pid: <0.2015.0>
    registered_name: []
    exception exit: {error,badarg}
      in function  gen_server:init_it/6 (gen_server.erl, line 835)
    ancestors: [net_sup,kernel_sup,<0.2001.0>]
    message_queue_len: 0
    messages: []
    links: [<0.2012.0>]
    dictionary: [{longnames,true}]
    trap_exit: true
    status: running
    heap_size: 376
    stack_size: 28
    reductions: 327
  neighbours:

=SUPERVISOR REPORT==== 23-May-2023::18:42:21.200885 ===
    supervisor: {local,kernel_sup}
    errorContext: start_error
    reason: {shutdown,
                {failed_to_start_child,net_kernel,{'EXIT',nodistribution}}}
    offender: [{pid,undefined},
               {id,net_sup},
               {mfargs,{erl_distribution,start_link,[]}},
               {restart_type,permanent},
               {significant,false},
               {shutdown,infinity},
               {child_type,supervisor}]

=CRASH REPORT==== 23-May-2023::18:42:21.205684 ===
  crasher:
    initial call: application_master:init/4
    pid: <0.2000.0>
    registered_name: []
    exception exit: {{shutdown,
                         {failed_to_start_child,net_sup,
                             {shutdown,
                                 {failed_to_start_child,net_kernel,
                                     {'EXIT',nodistribution}}}}},
                     {kernel,start,[normal,[]]}}
      in function  application_master:init/4 (application_master.erl, line 142)
    ancestors: [<0.1999.0>]
    message_queue_len: 1
    messages: [{'EXIT',<0.2001.0>,normal}]
    links: [<0.1999.0>,<0.1998.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 376
    stack_size: 28
    reductions: 167
  neighbours:

=INFO REPORT==== 23-May-2023::18:42:21.207911 ===
    application: kernel
    exited: {{shutdown,
                 {failed_to_start_child,net_sup,
                     {shutdown,
                         {failed_to_start_child,net_kernel,
                             {'EXIT',nodistribution}}}}},
             {kernel,start,[normal,[]]}}
    type: permanent

{"Kernel pid terminated",application_controller,"{application_start_failure,kernel,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,net_kernel,{'EXIT',nodistribution}}}}},{kernel,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,kernel,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,net_kernel,{'EXIT',nodistribution}}}}},{kernel,start,[normal,[]]}}})

Crash dump is being written to: erl_crash.dump...done

Update proxy architecture for better performance

As I alluded to in a couple of comments, I'm afraid the current architecture is not going to be as efficient as possible due to overhead of copying the message through many processes. AFAIR poolboy sends a message to the pool process and then to a worker process so that's two copies right there. Due to performance reasons we definitely need to rule out poolboy.

Whether we should use NimblePool or another pool library or roll something custom really depends on the architecture you have in mind. If you're sure you need pools then we're looking at implementing transactions control which is not going to be trivial. While we very well might end up exactly there, with pools, we think there's a different starting point (we definitely need to start somewhere!) which is no pools, no extra processes. That is, in the ranch listener callback we'd connect to the actual pg backend instance. That ranch process would be the one owning the socket, the only one interacting with it, and upon receiving data it would just :gen_tcp.send it. We'd never copy data, just pass it forward. I think even with this simplistic architecture we can implement a lot of features already:

we'd know when the pg backend disconnects (:tcp_closed message) and can act accordingly
we can keep a list of known pg backend instances and can choose which one to connect to: randomly, round-robin, etc.

The latter point in particular is I think pretty compelling, our proxy already can help spreading the load.

What do you think?

Originally posted by @wojtekmach in #4 (review)

Parse SQL

Parse sql into something we can address with a hook #91
Make it fast 🏎️
- https://github.com/abc3/epg_query
- https://github.com/Logflare/logflare/blob/staging/native/sqlparser_ex/src/lib.rs
Fingerprinting
- https://github.com/pganalyze/pg_query.rs#fingerprinting-a-query

PAUSE / RESUME like functionality

For draining transactions to a tenant database.

See: https://www.pgbouncer.org/usage.html#admin-console

optional timeout argument which disconnects database on timeout and prevents new connections from being established

db connection pool should live on one node of the cluster

edit: decided to go with same architecture as Realtime

Except:

Spin up 25% of the Postgres max connection limit per pool
- v0.0.0 will just use value from metadata db
- v1.0.0 spin up 10 on init connection and async query how many to actually spin up
If node_a can't connect to node_b start on self()
- v1.0.0 maybe have a pool on two nodes, if rpc doesn't work to acquire a connection use the other node

Should we do it like Supabase Realtime?

Supabase Realtime Overview

uses syn library for distributed registry
deterministically pick node by tenant id (e.g. :erlang.phash/2)
picks fly region mapped to aws region where tenant db lives
websocket connection comes into fra and it gets registered with syn
node in iad knows via syn to send wal events to websocket connection process on fra node

Or should we have some tenant aware proxy which routes connections to a node where tenant db pool lives?

Option with HAProxy

HAProxy somehow knows which node to route connections to

Option 3

We don't care where the client connection goes, and database connection pools get spun up on each Erlang node a client connects to.

Supavisor.Manager will lose its ets state upon process restart

The Supavisor.Manager has an ets table that will erase when the Manager process crashes. Is this acceptable?

There's a pattern to avoid data loss (I've used it at work): http://steve.vinoski.net/blog/2011/03/23/dont-lose-your-ets-tables/

I can submit a PR if state persistence is considered useful in this context.

LW blogpost

Race condition on `Process.unlink` when `mode` is `transaction`

Bug report

I confirm this is a bug with Supabase, not with my own application.
I confirm I have searched the Docs, GitHub Discussions, and Discord.

Describe the bug

In handle_event({:call, from}, {:client_call, bin, ready?}, data) (handler for database responses), if the client's mode is transaction and we received a ready?=true from the database, we check the db_pid back in and set it to nil.

The next time the db_pid is checked out and set in state is when the client makes another request. However, there are situations where the database will call the ClientHandler again before the client makes another request.

i.e. receiving ReadyForQuery is not a guarantee that the database won't send data until the next client request.

Currently, handle_db_pid fails when this happens, as we're trying to run Process.unlink(nil).

The solution is to ignore a nil db_pid in handle_db_pid. I think that's OK? But I don't know if getting into this state was intended behavior.

To Reproduce

What seems to be happening:

Db sends a ReadyForQuery after finishing the last query.
Client sends close and sync, seemingly to "wrap up" the last query. In the same bin, Client also sends parse, describe, sync - i.e., the next query.
Db sends back close_complete then ReadyForQuery in its own bin.
Then, it sends back the response to the subsequent query.

This is happening with the Prisma ORM client.

Expected behavior

ClientHandler does not crash when Db responds >1 time per client query.

Hard vs soft deployment strategies

Hard - full cluster rebuild with new instances - rolling or blue/green
Soft - hot code reloading (Erlang hot code reloading)

add tests for PgEdge.Jwt

Since it is security related consider adding comprehensive suite of tests for this part of the app in particular.

Originally posted by @wojtekmach in #8 (comment)

For managing REST API

listen/notify support with transaction pooling

Inspired by allas

Use a single connection to proxy listen notify requests to all clients.

Limiting the number of users per tenant

          Can tenant.users ever be a very large number? Because if yes, then this operation can take quite some time and it may be worth discussing if this is a problem or not.

Originally posted by @josevalim in #114 (comment)

Inititial tests

if syn has two connection pools alive on a cluster, the first connection pool should win

Subscribe to tenant database write ahead logs

Postgrex.ReplicationConnection behavior
Configurable: should be able to setup the replication connection however a tenant would want
Verify configuration on connection and error if tenant publication does not match expected configuration

Why?

Trigger hooks from matching records somehow
Bust any query caches based off of the write ahead log

Support md5 to tenant

handle sigterms and gracefully shutdown node

simply get Erlang to shut itself down gracefully
don't worry about what to do with db and client conns here yet let's do that in another issue

`server_version` not sent in parameter_status at end of authentication

Bug report

I confirm this is a bug with Supabase, not with my own application.
I confirm I have searched the Docs, GitHub Discussions, and Discord.

Describe the bug

Many clients will error when connecting to Supavisor. For example, consider Postgrex:

14:05:01.372 mfa=:gen_statem.error_info/7 pid=<0.1963.0> [error] GenStateMachine #PID<0.1963.0> terminating
** (FunctionClauseError) no function clause matching in String.split/3
    (elixir 1.14.3) lib/string.ex:479: String.split(nil, " ", [parts: 2])
    (postgrex 0.16.5) lib/postgrex/utils.ex:68: Postgrex.Utils.parse_version/1
    (postgrex 0.16.5) lib/postgrex/protocol.ex:1019: Postgrex.Protocol.bootstrap_send/3
    (postgrex 0.16.5) lib/postgrex/protocol.ex:661: Postgrex.Protocol.handshake/2
    (postgrex 0.16.5) lib/postgrex/protocol.ex:166: Postgrex.Protocol.connect_endpoints/6
    (db_connection 2.5.0) lib/db_connection/connection.ex:90: DBConnection.Connection.handle_event/4
    (stdlib 4.3.1) gen_statem.erl:1428: :gen_statem.loop_state_callback/11
    (stdlib 4.3.1) proc_lib.erl:240: :proc_lib.init_p_do_apply/3

This is because after sending AuthenticationOk, the server is expected to send ParameterStatus. Supavisor does this here.

However, there's a race condition. If the DbHandler hasn't finished its authentication flow, the Manager will not have the parameter_status set -- it will be its default []. This means clients will be sent nothing for ParameterStatus, which seems to cause errors in eg Postgrex.

Once parameter_status is set (the DbHandler has finished authenticating and has set that state in Manager), things work normally.

To Reproduce

Run Postgrex.start_link against a cold Supavisor instance.

Expected behavior

Postgrex.start_link starts without any errors.

Implementation

I'm happy to help with implementation, but I struggled a bit with design. I considered having Manager.get_parameter_status hang until parameter_status was set. But this feels like a workaround. If DbHandler fails to connect (auth is actually bad), ClientHandler will hang inexplicably.

I think we really want the ClientHandler to be intimately connected to the initial connection process to the database. Not only does it care about the ParameterStatus coming from the db. It should also proxy an error to the client if there is a login error (host actually not reachable, password is bad). I wonder if it should run the initial auth process against the db? Or, maybe, if the Manager was just booted (this is a fresh connection pool), it should await some startup process which includes running auth and grabbing ParameterStatus from the db.

`ClientHandler` and `DbHandler` can deadlock

Bug report

I confirm this is a bug with Supabase, not with my own application.
I confirm I have searched the Docs, GitHub Discussions, and Discord.

Describe the bug

ClientHandler uses GenServer.call to send messages to DbHandler. Likewise, DbHandler uses GenServer.call to send database responses back to ClientHandler.

Alas, this can deadlock.

To Reproduce

I can reliably reproduce when running a bunch of heavy queries with a lot of data each direction. But basically, the two sockets just need to receive data at roughly the same time for a deadlock to occur. You can increase chances of it happening by introducing a small wait in either handle_event.

Ideas

Perhaps DbHandler can fire-and-forget its messages to ClientHandler (e.g. cast)?

Alternatively, one of them probably needs to spin up a process to send data to the other one (in order to async hear back if it went OK).

tests: try to reduce mocking

          this is a separate discussion (apologies, I don't think we had one about it!) but while mocking is pretty convenient here I think it provides less confidence and more fragile (totally coupled to the implementation) test suite. Since it's trivial to spawn tcp sockets on the BEAM, I'd consider doing just that and testing against them.

Originally posted by @wojtekmach in #124 (comment)

Add Open Telemetry support

should be able to send spans to my own otel vendor
start with just a few easy otel events and we can expand from there in another issue

Register a hook

Note: Hooks v1 should be scoped to web hooks only. Break out function hooks into another issue later. Including here for comms on the final end state.

Hooks

We want to let users run semi-arbitrary code when certain things happen in the system. Users may want to do things when pools are spawned or destroyed. When certain kinds of SQL statements are submitted or when certain records are updated in the database.

A Hook can be a function hook (Postgres function call) or a web hook (http request)
Hooks can by sync or async

Function hooks

When you create a function hook it creates a hook record in the metadata database with the code required to create a Postgres function using a Postgres TLE language.
When a connection pool starts up Supavisor will create or replace all registered function hooks in the tenant database ensuring they exist.

Web hooks

Supavisor will manage making the http requests for web hooks.

Async hooks

Async hooks can be implemented as Telemetry events
A Telemetry event handler for a tenant database can attached to telemetry events based on hooks for a tenant in the metadata database
Example: like Realtime's Telemetry.Logger but instead of logging it would dynamically do what was registered with the hook, and the Telemetry events could be registered dynamically

Sync hooks

Sync hooks should be ran in between hook events and block until the hook response

Hook events

Hook evens will be generated from:

sql queries
wal records
tenant db metrics polling (e.g. db size - see #88)

Unknown:

How would we let users address these?

Examples

As a db admin I want to turn my database to read-only mode when it's used a certain amount of storage

polling metric queries db size every minute
polling process emits a Telemetry event with db size
Telemetry hook handler matches on db size to trigger
Function registered with hook is called and includes ALTER SYSTEM SET default_transaction_read_only TO on;
Database is in read only mode now

As a dev I want my Stripe data updated in Stripe when I update a customer record in my database:

Register a synchronous function hook with code which will use http to make a request to Stripe
This hook will run on insert statements on a customers table for example
The insert will error to the SQL client if Stripe responds with an error

Prior Art

Auth0 Actions: https://www.youtube.com/watch?v=UesFSY1klrI

Implementing SSL Support

Buffer database connections

As a database admin I should be able to restart Postgres without affecting client connections.

Requirements:

Buffer size limit

Unknowns:

What do we do when the database restarts in the middle of a transaction?

Supavisor fails to query Postgres version 13.10 or earlier

Bug report

I confirm this is a bug with Supabase, not with my own application.
I confirm I have searched the Docs, GitHub Discussions, and Discord.

Describe the bug

For Postgres version 13.10 or earlier, Supavisor is able to connect but unable to complete a query.

To Reproduce

Steps to reproduce the behavior, please provide code snippets or a repository:

Clone and install Supavisor per the docs
Attempt to connect to Postgres version 13.10 or earlier through Supavisor.
Run select 1; or a similar query.
See error: [error] Connection closed when state was idle

Expected behavior

Supavisor is able to query supported Postgres versions 11.19 and up.

Screenshots

n/a

System information

OS: macOS Ventura 13.2.1
Postggres 11.19, 12.14, or 13.10
Version of supabase/supavisor: latest main (5d16420)

Additional context

The full error output is:

10:55:37.184 [error] Connection closed when state was idle file=lib/supavisor/db_handler.ex line=199 pid=<0.896.0> project=dev_tenant user=postgres

The dockerfile for the target Postgres instance is:

version: "3"

services:
  postgres:
    image: "postgres:13.10"
    command: postgres -c shared_preload_libraries=pg_stat_statements -c pg_stat_statements.track=all -c max_connections=200 -c pg_stat_statements.max=10000 -c track_activity_query_size=2048
    ports:
      - "5432:5432"
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: postgres

Session pool mode

pool_mode should live on the tenant record and be one of: session, transaction default should be transaction
client gets a database connection for the life of the client connection
When one disconnects the other should disconnect
db pool size should be related
session_mode_warm_size integer to keep

Pick a name

I don't love pg_edge what other options do we have?

pg_edge_bouncer
pg_ex_pool
pg_potion
pg_stan h/t @abc3

Load test

100_000 Postgres client connections to one Supavisor cluster
One query issued successfully
One tenant database

Optionally (if no major blockers):

1_000_000 connections to one Supavisor cluster
One query issued successfully
One tenant database

Github action for Fly deploys

Wire protocol over WebSockets

It should be easy for people to use Supabase with Vercel Postgres.

Adds support for the wire protocol over WebSockets such that the @vercel/postgres client works. See:
https://vercel.com/blog/vercel-storage#vercel-postgres-complex-data-made-easy

@vercel/postgres wraps @neondatabase/serverless which uses WebSockets to transport the wire protocol because some serverless envs like Cloudflare don't give you a TCP connection.

To use @neondatabase/serverless you need a proxy which handles WebSocket connections and turns it into a TCP Postgres connection.

The behavior of the proxy ultimately is here: https://github.com/neondatabase/wsproxy

Initial docs

installation section
usage section

Base db pool size off of Postgres max_connections config

On first connection query for show max_connections; and make connection pool a configurable percentage of this number

Support named prepared statements

No pooler currently supports named prepared statements. Lots of SQL clients which use their own pool do e.g. Elixir's Ecto uses named prepared statements by default and ships with it's own pooling logic.

It seems reasonable for a pooler to be able to support them. Named prepared statements can increase throughput significantly because the query plan is cached.

Unnamed prepared statements do not use a cached query plan because it is deallocated with the next unnamed prepared statement.

If successfully created, a named prepared-statement object lasts till the end of the current session, unless explicitly destroyed. An unnamed prepared statement lasts only until the next Parse statement specifying the unnamed statement as destination is issued. ref

Questions

What if two different clients linked to one pg connection make two different prepared statements with the same name?

Because it's an async protocol can we add metadata to messages that Postgres will include coming back? Then we can use that to route the right messages to the right clients.

Can we use pipelining to multiplex?

Ideas

Another pool mode
Pin a client connection to a db conn such that each client would always get the same db conn from the pool

There would be some contention here probably because clients would have to wait on a specific conn but it would work. Question is does the contention negate the perf from the prepared statement. With another mode transaction pooling would work as expected but you could use this new mode if it works for your workload and provides perf gains.

Metric endpoint per tenant

We need to expose a metric endpoint per tenant where all the metrics are only for a certain tenant.

/metrics/:external_id
filter per tenant metrics out of the /metrics endpoint so it stays small
prioritize these metrics:

Webhooks for events

Webhooks for certain events:

before pool starts - sync and async option (sync should block starting pool until webhook responds)
after pool stopped

Configurable `ext_id_location` per tenant

We need to pull the external_id from somewhere. We can pull it from the credentials or the sni_host.

first check username if there is a period in it, use string before period to look up tenant
if no . in username use sni_host to lookup if connection is SSL

Multi-stage deployments with Actions

Like: https://www.youtube.com/watch?v=sVl6De94evo

Connection maintenence

Monitor the connection for cleanup
What does Ranch do when the client closes the connection?
If connection goes bad make sure to disconnect from Postgres so it can cleanup

Start pools even if user isn't in user table

Use some "default user pool config" with a pool_size
pool_type should be based on port maybe - 5432 = session and 6543 = transaction
Flag on tenant record for require_user and if require_user is false we allow user creds to pull from connection string or other method
require_user field default false
Authentication of user to tenant database will happen on pool start and error to client if auth fails
tenant username and pass stored in Cachex
Docs - overview of this behavior
Docs - tell people what happens if they change username/pass in tenant db
Docs - example auth user query

OpenAPI Specs

OpenAPI 3.0

`hibernate_after` for client and db handlers

ref

Multiple pools per tenant

We have multiple microservices using tenant databases. We don't want one service (e.g. Storage @inian) checking out all database connections during (potentially) long transactions.

You could just put some simple metadata in the tenant external_id. So the external_id would be a hash of the tenant (for Supabase it's the project id) and some other key.

Example: mfrkmguhoejspftfvgdz_storage and then when the Storage service connects it the connection string would be psql postgresql://postgres.mfrkmguhoejspftfvgdz_storage:postgres@localhost:7654/postgres

Or should we have some more official pool_key on the tenant record where the username in the connection string would be external_id <> _ <> pool_key?

############## edit ################

Implementation

@abc3 and I discussed:

user table with multiple users per tenant. A user is a Postgres user.
schema see below
spin up a connection pool per tenant user on connect
db_user_alias field will be an alias of the db_user field so that you can create run a transaction and session pool with the same Postgres user at the same time
is_manager field will flag a user to be used for eventual database management functionality which only Postgres admins can do when needed

schema

field(:db_host, :string)
field(:db_password, :binary)
field(:db_port, :integer)
field(:db_user, :string)
field(:db_user_alias, :string)
field(:is_manager, :boolean, default: false)
field(:mode_type, :string)
field(:pool_size, :integer)
belongs_to(:tenant, Supavisor.Tenants.Tenant, foreign_key: :tenant_external_id, type: :string)

Github Action for executable binary compilation

With burrito
(optional) apply config from command line args and/or config file

supabase / supavisor Goto Github PK

supavisor's Introduction

Supavisor - Postgres connection pooler

Overview

Motivation

Architecture

Docs

Features

Future Work

Benchmarks

Local Benchmarks

Load Test

Acknowledgements

Inspiration

Commercial Inspiration

supavisor's People

Contributors

Stargazers

Watchers

Forkers

supavisor's Issues

Chore

Describe the chore

Additional context

Bug report

Describe the bug

To Reproduce

Expected behavior

Screenshots

System information

Additional context

Comment

Bug report

Describe the bug

To Reproduce

Expected behavior

System information

Improve documentation

Link

Describe the problem

Describe the improvement

Additional context

Supabase Realtime Overview

Option with HAProxy

Option 3

Bug report

Describe the bug

To Reproduce

Expected behavior

Bug report

Describe the bug

To Reproduce

Expected behavior

Implementation

Bug report

Describe the bug

To Reproduce

Ideas

Hooks

Function hooks

Web hooks

Async hooks

Sync hooks

Hook events

Examples

Prior Art

Bug report

Describe the bug

To Reproduce

Expected behavior

Screenshots

System information

Additional context

Questions

Ideas

Implementation

schema

Recommend Projects

Recommend Topics

Recommend Org