Giter Club home page Giter Club logo

supavisor's Introduction

Supavisor

Supavisor - Postgres connection pooler

Overview

Supavisor is a scalable, cloud-native Postgres connection pooler. A Supavisor cluster is capable of proxying millions of Postgres end-client connections into a stateful pool of native Postgres database connections.

For database managers, Supavisor simplifies the task of managing Postgres clusters by providing easy configuration of highly available Postgres clusters (todo).

Motivation

We have several goals with Supavisor:

  • Zero-downtime scaling: we want to scale Postgres server compute with zero-downtime. To do this, we need an external Pooler that can buffer and re-route requests while the resizing operation is in progress.
  • Handling modern connection demands: We need a Pooler that can absorb millions of connections. We often see developers connecting to Postgres from Serverless environments, and so we also need something that works with both TCP and HTTP protocols.
  • Efficiency: Our customers pay for database processing power, and our goal is to maximize their database capacity. While PgBouncer is resource-efficient, it still consumes some resources on the database instance. By moving connection pooling to a dedicated cluster adjacent to tenant databases, we can free up additional resources to better serve customer queries.

Architecture

Supavisor was designed to work in a cloud computing environment as a highly available cluster of nodes. Tenant configuration is stored in a highly available Postgres database. Configuration is loaded from the Supavisor database when a tenant connection pool is initiated.

Connection pools are dynamic. When a tenant client connects to the Supavisor cluster the tenant pool is started and all connections to the tenant database are established. The process ID of the new tenant pool is then distributed to all nodes of the cluster and stored in an in-memory key-value store. Subsequent tenant client connections live on the inbound node but connection data is proxied from the pool node to the client connection node as needed.

Because the count of Postgres connections is constrained only one tenant connection pool should be alive in a Supavisor cluster. In the case of two simultaneous client connections starting a pool, as the pool process IDs are distributed across the cluster, eventually one of those pools is gracefully shutdown.

The dynamic nature of tenant database connection pools enables high availability in the event of node outages. Pool processes are monitored by each node. If a node goes down that process ID is removed from the cluster. Tenant clients will then start a new pool automatically as they reconnect to the cluster.

This design enables blue-green or rolling deployments as upgrades require. A single VPC / multiple availability zone topologies is possible and can provide for greater redundancy when load balancing queries across read replicas are supported (todo).

Docs

Features

  • Fast
    • Within 90% throughput as compared to PgBouncer running pgbench locally
  • Scalable
    • 1 million Postgres connections on a cluster
    • 250_000 idle connections on a single 16 core node with 64GB of ram
  • Multi-tenant
    • Connect to multiple different Postgres instances/clusters
  • Single-tenant
    • Easy drop-in replacement for PgBouncer
  • Pool mode support per tenant
    • Transaction
  • Cloud-native
    • Cluster-able
    • Resilient during cluster resizing
    • Supports rolling and blue/green deployment strategies
    • NOT run in a serverless environment
    • NOT dependant on Kubernetes
  • Observable
    • Easily understand throughput by tenant, tenant database or individual connection
    • Prometheus /metrics endpoint
  • Manageable
    • OpenAPI spec at /api/openapi
    • SwaggerUI at /swaggerui
  • Highly available
    • When deployed as a Supavisor cluster and a node dies connection pools should be quickly spun up or already available on other nodes when clients reconnect
  • Connection buffering
    • Brief connection buffering for transparent database restarts or failovers

Future Work

  • Load balancing
    • Queries can be load balanced across read-replicas
    • Load balancing is independant of Postgres high-availability management (see below)
  • Query caching
    • Query results are optionally cached in the pool cluster and returned before hitting the tenant database
  • Session pooling
    • Like PgBouncer
  • Multi-protocol Postgres query interface
    • Postgres binary
    • HTTPS
    • Websocket
  • Postgres high-availability management
    • Primary database election on primary failure
    • Health checks
    • Push button read-replica configuration
  • Config as code
    • Not only for the supavisor cluster but tenant databases and tenant database clusters as well
    • Pulumi / terraform support

Benchmarks

Local Benchmarks

  • Running pgbench on PgBouncer (transaction mode/pool size 60)
PGPASSWORD=postgres pgbench -M extended --transactions 100 --jobs 10 --client 100 -h localhost -p 6452 -U postgres postgres
pgbench (15.2, server 14.6 (Debian 14.6-1.pgdg110+1))
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: extended
number of clients: 100
number of threads: 10
maximum number of tries: 1
number of transactions per client: 100
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 510.310 ms
initial connection time = 31.388 ms
tps = 195.959361 (without initial connection time)
  • Running pgbench on Supavisor (pool size 60, no logs)
PGPASSWORD=postgres pgbench -M extended --transactions 100 --jobs 10 --client 100 -h localhost -p 7654 -U postgres.localhost postgres
pgbench (15.2, server 14.6 (Debian 14.6-1.pgdg110+1))
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: extended
number of clients: 100
number of threads: 10
maximum number of tries: 1
number of transactions per client: 100
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 528.463 ms
initial connection time = 178.591 ms
tps = 189.228103 (without initial connection time)

Load Test

Supavisor load test virtual users chart

Supavisor load test qps chart

  • Supavisor two node cluster
    • 64vCPU / 246RAM
    • Ubuntu 22.04.2 aarch64
  • 1_003_200 concurrent client connection
  • 20_000+ QPS
  • 400 tenant Postgres connection
  • select * from (values (1, 'one'), (2, 'two'), (3, 'three')) as t (num,letter);
  • ~50% CPU utilization (pool owner node)
  • 7.8G RAM usage

Acknowledgements

José Valim and the Dashbit team were incredibly helpful in informing the design decisions for Supavisor.

Inspiration

Commercial Inspiration

supavisor's People

Contributors

abc3 avatar acco avatar altjohndev avatar awalias avatar chasers avatar christianalexander avatar delgado3d avatar dependabot[bot] avatar j0 avatar kiwicopple avatar mildtomato avatar w3b6x9 avatar zorbash avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

supavisor's Issues

Support limiting for postgres-related statistics

Chore

Describe the chore

Many providers (like supabase) require database requests to be throttled or rejected based on certain thresholds. These are typically:

  • Max storage
  • Egress over a certain period
  • Total egress per request
  • CPU (once it gets high, perhaps some sort of throttling

The most important/easiest is the first: Max Storage. Supabase typically puts a database into read-only mode once it gets close to disk limits. This could be better handled in the Proxy - this is because often a user starts uploading terrabytes of data, and their database fills up too fast to issue any commands.

For the implementation, it could be good to ping the database statistics and store them with the tenant config or in some faster cache

Additional context

Consolidate client and server ends of the protocol in a single module

Since you're developing both client and server ends of the protocol I'd consider having a single module that does encoding and decoding:

defmodule PgEdge.Protocol do
  def encode_startup_message(input)

  def decode_startup_message(input)
end

I think keeping the code collocated will make it easier to maintain and make sure it is correct. Another benefit is keeping "symmetry", ideally we'd have:

assert message == message |> encode_startup_message() |> decode_startup_message()

Since you are already using :pgo_protocol.encode_startup_message, I'd definitely continue doing so. I'd just wrap it with the extra function to ensure the interface I mentioned above. Another benefit is if you ever choose to use a different underlying library or hand-roll encoding, the rest of the system would be unaffected because they'd only know about Protocol.encode_startup_message function.

I think another reason it is fine to keep a single module is we won't have to implement the entirety of the pg wire protocol, just the bits that we need which are around establishing the connection (and transaction control if we choose to go down that path), it is fine if the rest of the protocol is opaque to our proxy and we just forward it as is without any decoding/re-encoding.

Originally posted by @wojtekmach in #4 (comment)

Initial metrics

  • Network usage or all tenants
    • Poll inet Erlang module periodically
  • Queries total etc
  • Pool stats
    • Actual db pool size
    • Connected clients
    • Pool checkout queue time

Route lo list all tenants is raising error when there are tenants to list

Bug report

  • [ x ] I confirm this is a bug with Supabase, not with my own application.
  • [ x ] I confirm I have searched the Docs, GitHub Discussions, and Discord.

Describe the bug

Route to get tenants http://localhost:4000/api/tenants raises an error when there are tenants to list

To Reproduce

Steps to reproduce the behavior, please provide code snippets or a repository:

  1. Launch the Supavisor application with make dev
  2. Add a tenant
curl -X PUT   'http://localhost:4000/api/tenants/dev_tenant'   --header 'Accept: application/json'   --header 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJvbGUiOiJhbm9uIiwiaWF0IjoxNjQ1MTkyODI0LCJleHAiOjE5NjA3Njg4MjR9.M9jrxyvPLkUxWgOYSf5dNdJ8v_eRrq810ShFRT8N-6M'   --header 'Content-Type: application/json'   --data-raw '{
  "tenant": {
    "db_host": "localhost",
    "db_port": 6432,
    "db_database": "postgres",
    "users": [
      {
        "db_user": "postgres",
        "db_password": "postgres",
        "pool_size": 20,
        "mode_type": "transaction"
      }
    ]
  }
}'
  1. List all tenants
curl -X 'GET'   'http://localhost:4000/api/tenants' --header 'Accept: application/json'   --header 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJvbGUiOiJhbm9uIiwiaWF0IjoxNjQ1MTkyODI0LCJleHAiOjE5NjA3Njg4MjR9.M9jrxyvPLkUxWgOYSf5dNdJ8v_eRrq810ShFRT8N-6M'   --header 'Content-Type: application/json'
  1. The error will be raised right away

Expected behavior

No error should be raised and info on all tenants should be displayed

Screenshots

image

System information

  • OS: Ubuntu 22.04.1 LTS
  • Supavisor's last commit 443c46b

Additional context

N/A

Comment

If this is not a bug, please enlighten me on what i did wrong as i just followed the steps on Installation-and-Usage

Supavisor needs tenant's password to do the routing using its proxy

Bug report

  • I confirm this is a bug with Supabase, not with my own application.
  • I confirm I have searched the Docs, GitHub Discussions, and Discord.

Describe the bug

After registering a tenant with its credentials using the PUT API I was expecting to use the proxy passing the user prefix, the tenant id and the DB password of supavisor DB, then the supavisor would retrieve internally the credentials of the tenant and would do the expected routing for us.

Right now I'm needing to pass the user prefix, the tenant id and the password of the database of the tenant, this won't make sense because we already have this password saved previously, right?

To Reproduce

curl --location --request PUT 'localhost:4000/api/tenants/84f9b050-6e82-4fda-a904-24958a7d184d' \
--header 'accept: application/json' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpYXQiOjE2ODAxNjIxNTR9.U9orU6YYqXAtpF8uAiw6MS553tm4XxRzxOhz2IwDhpY' \
--data '{
  "tenant": {
    "db_database": "tenant1",
    "db_host": "localhost",
    "db_port": 5432,
    "users": [
      {
        "db_password": "tenant1",
        "db_user": "tenant1",
        "mode_type": "transaction",
        "pool_size": 10
      }
    ]
  }
}'

  • Try to access the proxy with the password of the supavisor DB:
psql postgresql://tenant1.84f9b050-6e82-4fda-a904-24958a7d184d:password@localhost:7654/tenant1

Will fail with:

psql: error: connection to server at "localhost" (::1), port 7654 failed: error received from server in SCRAM exchange: Invalid client signature

image

  • Try to access the proxy with the password of the tenant DB:
psql postgresql://tenant1.84f9b050-6e82-4fda-a904-24958a7d184d:tenant1@localhost:7654/tenant1

Will work
image

Expected behavior

Be able to connect passing only the password of supavisor's DB.

System information

  • OS: macOS & Linux
  • Version of supavisor: #96

Add suggestion to set FLY_APP_NAME if you're not running on Fly

Improve documentation

Link

https://github.com/supabase/supavisor/wiki/Installation-and-Usage

Describe the problem

On this page, it's assuming the deployment will be made using Fly, but during some tests locally using MIX_ENV=prod, we got the error Invalid node name! Please check your configuration when starting the application.

Describe the improvement

Clarify that if you're not using Fly, set the env FLY_APP_NAME (this var is automatically set if you're using Fly)env var;

Additional context

eenv.sh.eex

Full error due the missing of FLY_APP_NAME:

=INFO REPORT==== 23-May-2023::18:42:21.198353 ===
Invalid node name!
Please check your configuration

=SUPERVISOR REPORT==== 23-May-2023::18:42:21.198873 ===
    supervisor: {local,net_sup}
    errorContext: start_error
    reason: {'EXIT',nodistribution}
    offender: [{pid,undefined},
               {id,net_kernel},
               {mfargs,{net_kernel,start_link,
                                   [#{clean_halt => true,name => '@127.0.0.1',
                                      name_domain => longnames,
                                      supervisor => net_sup}]}},
               {restart_type,permanent},
               {significant,false},
               {shutdown,2000},
               {child_type,worker}]

=CRASH REPORT==== 23-May-2023::18:42:21.200453 ===
  crasher:
    initial call: net_kernel:init/1
    pid: <0.2015.0>
    registered_name: []
    exception exit: {error,badarg}
      in function  gen_server:init_it/6 (gen_server.erl, line 835)
    ancestors: [net_sup,kernel_sup,<0.2001.0>]
    message_queue_len: 0
    messages: []
    links: [<0.2012.0>]
    dictionary: [{longnames,true}]
    trap_exit: true
    status: running
    heap_size: 376
    stack_size: 28
    reductions: 327
  neighbours:

=SUPERVISOR REPORT==== 23-May-2023::18:42:21.200885 ===
    supervisor: {local,kernel_sup}
    errorContext: start_error
    reason: {shutdown,
                {failed_to_start_child,net_kernel,{'EXIT',nodistribution}}}
    offender: [{pid,undefined},
               {id,net_sup},
               {mfargs,{erl_distribution,start_link,[]}},
               {restart_type,permanent},
               {significant,false},
               {shutdown,infinity},
               {child_type,supervisor}]

=CRASH REPORT==== 23-May-2023::18:42:21.205684 ===
  crasher:
    initial call: application_master:init/4
    pid: <0.2000.0>
    registered_name: []
    exception exit: {{shutdown,
                         {failed_to_start_child,net_sup,
                             {shutdown,
                                 {failed_to_start_child,net_kernel,
                                     {'EXIT',nodistribution}}}}},
                     {kernel,start,[normal,[]]}}
      in function  application_master:init/4 (application_master.erl, line 142)
    ancestors: [<0.1999.0>]
    message_queue_len: 1
    messages: [{'EXIT',<0.2001.0>,normal}]
    links: [<0.1999.0>,<0.1998.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 376
    stack_size: 28
    reductions: 167
  neighbours:

=INFO REPORT==== 23-May-2023::18:42:21.207911 ===
    application: kernel
    exited: {{shutdown,
                 {failed_to_start_child,net_sup,
                     {shutdown,
                         {failed_to_start_child,net_kernel,
                             {'EXIT',nodistribution}}}}},
             {kernel,start,[normal,[]]}}
    type: permanent

{"Kernel pid terminated",application_controller,"{application_start_failure,kernel,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,net_kernel,{'EXIT',nodistribution}}}}},{kernel,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,kernel,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,net_kernel,{'EXIT',nodistribution}}}}},{kernel,start,[normal,[]]}}})

Crash dump is being written to: erl_crash.dump...done

Update proxy architecture for better performance

As I alluded to in a couple of comments, I'm afraid the current architecture is not going to be as efficient as possible due to overhead of copying the message through many processes. AFAIR poolboy sends a message to the pool process and then to a worker process so that's two copies right there. Due to performance reasons we definitely need to rule out poolboy.

Whether we should use NimblePool or another pool library or roll something custom really depends on the architecture you have in mind. If you're sure you need pools then we're looking at implementing transactions control which is not going to be trivial. While we very well might end up exactly there, with pools, we think there's a different starting point (we definitely need to start somewhere!) which is no pools, no extra processes. That is, in the ranch listener callback we'd connect to the actual pg backend instance. That ranch process would be the one owning the socket, the only one interacting with it, and upon receiving data it would just :gen_tcp.send it. We'd never copy data, just pass it forward. I think even with this simplistic architecture we can implement a lot of features already:

  • we'd know when the pg backend disconnects (:tcp_closed message) and can act accordingly
  • we can keep a list of known pg backend instances and can choose which one to connect to: randomly, round-robin, etc.

The latter point in particular is I think pretty compelling, our proxy already can help spreading the load.

What do you think?

Originally posted by @wojtekmach in #4 (review)

db connection pool should live on one node of the cluster

edit: decided to go with same architecture as Realtime

Except:

  • Spin up 25% of the Postgres max connection limit per pool
    • v0.0.0 will just use value from metadata db
    • v1.0.0 spin up 10 on init connection and async query how many to actually spin up
  • If node_a can't connect to node_b start on self()
    • v1.0.0 maybe have a pool on two nodes, if rpc doesn't work to acquire a connection use the other node

Should we do it like Supabase Realtime?

Supabase Realtime Overview

  • uses syn library for distributed registry
  • deterministically pick node by tenant id (e.g. :erlang.phash/2)
  • picks fly region mapped to aws region where tenant db lives
  • websocket connection comes into fra and it gets registered with syn
  • node in iad knows via syn to send wal events to websocket connection process on fra node

erlang-routes-conns

Or should we have some tenant aware proxy which routes connections to a node where tenant db pool lives?

Option with HAProxy

  • HAProxy somehow knows which node to route connections to

load-balancer-routes-conns

Option 3

  • We don't care where the client connection goes, and database connection pools get spun up on each Erlang node a client connects to.

tenant-pool-on-each-erlang-node

Race condition on `Process.unlink` when `mode` is `transaction`

Bug report

  • I confirm this is a bug with Supabase, not with my own application.
  • I confirm I have searched the Docs, GitHub Discussions, and Discord.

Describe the bug

In handle_event({:call, from}, {:client_call, bin, ready?}, data) (handler for database responses), if the client's mode is transaction and we received a ready?=true from the database, we check the db_pid back in and set it to nil.

The next time the db_pid is checked out and set in state is when the client makes another request. However, there are situations where the database will call the ClientHandler again before the client makes another request.

i.e. receiving ReadyForQuery is not a guarantee that the database won't send data until the next client request.

Currently, handle_db_pid fails when this happens, as we're trying to run Process.unlink(nil).

The solution is to ignore a nil db_pid in handle_db_pid. I think that's OK? But I don't know if getting into this state was intended behavior.

To Reproduce

What seems to be happening:

  1. Db sends a ReadyForQuery after finishing the last query.
  2. Client sends close and sync, seemingly to "wrap up" the last query. In the same bin, Client also sends parse, describe, sync - i.e., the next query.
  3. Db sends back close_complete then ReadyForQuery in its own bin.
  4. Then, it sends back the response to the subsequent query.

This is happening with the Prisma ORM client.

Expected behavior

ClientHandler does not crash when Db responds >1 time per client query.

Inititial tests

  • if syn has two connection pools alive on a cluster, the first connection pool should win

`server_version` not sent in parameter_status at end of authentication

Bug report

  • I confirm this is a bug with Supabase, not with my own application.
  • I confirm I have searched the Docs, GitHub Discussions, and Discord.

Describe the bug

Many clients will error when connecting to Supavisor. For example, consider Postgrex:

14:05:01.372 mfa=:gen_statem.error_info/7 pid=<0.1963.0> [error] GenStateMachine #PID<0.1963.0> terminating
** (FunctionClauseError) no function clause matching in String.split/3
    (elixir 1.14.3) lib/string.ex:479: String.split(nil, " ", [parts: 2])
    (postgrex 0.16.5) lib/postgrex/utils.ex:68: Postgrex.Utils.parse_version/1
    (postgrex 0.16.5) lib/postgrex/protocol.ex:1019: Postgrex.Protocol.bootstrap_send/3
    (postgrex 0.16.5) lib/postgrex/protocol.ex:661: Postgrex.Protocol.handshake/2
    (postgrex 0.16.5) lib/postgrex/protocol.ex:166: Postgrex.Protocol.connect_endpoints/6
    (db_connection 2.5.0) lib/db_connection/connection.ex:90: DBConnection.Connection.handle_event/4
    (stdlib 4.3.1) gen_statem.erl:1428: :gen_statem.loop_state_callback/11
    (stdlib 4.3.1) proc_lib.erl:240: :proc_lib.init_p_do_apply/3

This is because after sending AuthenticationOk, the server is expected to send ParameterStatus. Supavisor does this here.

However, there's a race condition. If the DbHandler hasn't finished its authentication flow, the Manager will not have the parameter_status set -- it will be its default []. This means clients will be sent nothing for ParameterStatus, which seems to cause errors in eg Postgrex.

Once parameter_status is set (the DbHandler has finished authenticating and has set that state in Manager), things work normally.

To Reproduce

Run Postgrex.start_link against a cold Supavisor instance.

Expected behavior

Postgrex.start_link starts without any errors.

Implementation

I'm happy to help with implementation, but I struggled a bit with design. I considered having Manager.get_parameter_status hang until parameter_status was set. But this feels like a workaround. If DbHandler fails to connect (auth is actually bad), ClientHandler will hang inexplicably.

I think we really want the ClientHandler to be intimately connected to the initial connection process to the database. Not only does it care about the ParameterStatus coming from the db. It should also proxy an error to the client if there is a login error (host actually not reachable, password is bad). I wonder if it should run the initial auth process against the db? Or, maybe, if the Manager was just booted (this is a fresh connection pool), it should await some startup process which includes running auth and grabbing ParameterStatus from the db.

`ClientHandler` and `DbHandler` can deadlock

Bug report

  • I confirm this is a bug with Supabase, not with my own application.
  • I confirm I have searched the Docs, GitHub Discussions, and Discord.

Describe the bug

ClientHandler uses GenServer.call to send messages to DbHandler. Likewise, DbHandler uses GenServer.call to send database responses back to ClientHandler.

Alas, this can deadlock.

To Reproduce

I can reliably reproduce when running a bunch of heavy queries with a lot of data each direction. But basically, the two sockets just need to receive data at roughly the same time for a deadlock to occur. You can increase chances of it happening by introducing a small wait in either handle_event.

Ideas

Perhaps DbHandler can fire-and-forget its messages to ClientHandler (e.g. cast)?

Alternatively, one of them probably needs to spin up a process to send data to the other one (in order to async hear back if it went OK).

tests: try to reduce mocking

          this is a separate discussion (apologies, I don't think we had one about it!) but while mocking is pretty convenient here I think it provides less confidence and more fragile (totally coupled to the implementation) test suite. Since it's trivial to spawn tcp sockets on the BEAM, I'd consider doing just that and testing against them.

Originally posted by @wojtekmach in #124 (comment)

Add Open Telemetry support

  • should be able to send spans to my own otel vendor
  • start with just a few easy otel events and we can expand from there in another issue

Register a hook

Note: Hooks v1 should be scoped to web hooks only. Break out function hooks into another issue later. Including here for comms on the final end state.

Hooks

We want to let users run semi-arbitrary code when certain things happen in the system. Users may want to do things when pools are spawned or destroyed. When certain kinds of SQL statements are submitted or when certain records are updated in the database.

  • A Hook can be a function hook (Postgres function call) or a web hook (http request)
  • Hooks can by sync or async

Function hooks

  • When you create a function hook it creates a hook record in the metadata database with the code required to create a Postgres function using a Postgres TLE language.
  • When a connection pool starts up Supavisor will create or replace all registered function hooks in the tenant database ensuring they exist.

Web hooks

  • Supavisor will manage making the http requests for web hooks.

Async hooks

  • Async hooks can be implemented as Telemetry events
  • A Telemetry event handler for a tenant database can attached to telemetry events based on hooks for a tenant in the metadata database
  • Example: like Realtime's Telemetry.Logger but instead of logging it would dynamically do what was registered with the hook, and the Telemetry events could be registered dynamically

Sync hooks

  • Sync hooks should be ran in between hook events and block until the hook response

Hook events

Hook evens will be generated from:

  • sql queries
  • wal records
  • tenant db metrics polling (e.g. db size - see #88)

Unknown:

  • How would we let users address these?

Examples

As a db admin I want to turn my database to read-only mode when it's used a certain amount of storage

  • polling metric queries db size every minute
  • polling process emits a Telemetry event with db size
  • Telemetry hook handler matches on db size to trigger
  • Function registered with hook is called and includes ALTER SYSTEM SET default_transaction_read_only TO on;
  • Database is in read only mode now

As a dev I want my Stripe data updated in Stripe when I update a customer record in my database:

  • Register a synchronous function hook with code which will use http to make a request to Stripe
  • This hook will run on insert statements on a customers table for example
  • The insert will error to the SQL client if Stripe responds with an error

Prior Art

Buffer database connections

As a database admin I should be able to restart Postgres without affecting client connections.

Requirements:

  • Buffer size limit

Unknowns:

  • What do we do when the database restarts in the middle of a transaction?

Supavisor fails to query Postgres version 13.10 or earlier

Bug report

  • I confirm this is a bug with Supabase, not with my own application.
  • I confirm I have searched the Docs, GitHub Discussions, and Discord.

Describe the bug

For Postgres version 13.10 or earlier, Supavisor is able to connect but unable to complete a query.

To Reproduce

Steps to reproduce the behavior, please provide code snippets or a repository:

  1. Clone and install Supavisor per the docs
  2. Attempt to connect to Postgres version 13.10 or earlier through Supavisor.
  3. Run select 1; or a similar query.
  4. See error: [error] Connection closed when state was idle

Expected behavior

Supavisor is able to query supported Postgres versions 11.19 and up.

Screenshots

n/a

System information

  • OS: macOS Ventura 13.2.1
  • Postggres 11.19, 12.14, or 13.10
  • Version of supabase/supavisor: latest main (5d16420)

Additional context

The full error output is:

10:55:37.184 [error] Connection closed when state was idle file=lib/supavisor/db_handler.ex line=199 pid=<0.896.0> project=dev_tenant user=postgres

The dockerfile for the target Postgres instance is:

version: "3"

services:
  postgres:
    image: "postgres:13.10"
    command: postgres -c shared_preload_libraries=pg_stat_statements -c pg_stat_statements.track=all -c max_connections=200 -c pg_stat_statements.max=10000 -c track_activity_query_size=2048
    ports:
      - "5432:5432"
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: postgres

Session pool mode

  • pool_mode should live on the tenant record and be one of: session, transaction default should be transaction
  • client gets a database connection for the life of the client connection
  • When one disconnects the other should disconnect
  • db pool size should be related
  • session_mode_warm_size integer to keep

Pick a name

I don't love pg_edge what other options do we have?

  • pg_edge_bouncer
  • pg_ex_pool
  • pg_potion
  • pg_stan h/t @abc3

Load test

  • 100_000 Postgres client connections to one Supavisor cluster
  • One query issued successfully
  • One tenant database

Optionally (if no major blockers):

  • 1_000_000 connections to one Supavisor cluster
  • One query issued successfully
  • One tenant database

Wire protocol over WebSockets

It should be easy for people to use Supabase with Vercel Postgres.

Adds support for the wire protocol over WebSockets such that the @vercel/postgres client works. See:
https://vercel.com/blog/vercel-storage#vercel-postgres-complex-data-made-easy

@vercel/postgres wraps @neondatabase/serverless which uses WebSockets to transport the wire protocol because some serverless envs like Cloudflare don't give you a TCP connection.

To use @neondatabase/serverless you need a proxy which handles WebSocket connections and turns it into a TCP Postgres connection.

The behavior of the proxy ultimately is here: https://github.com/neondatabase/wsproxy

Support named prepared statements

No pooler currently supports named prepared statements. Lots of SQL clients which use their own pool do e.g. Elixir's Ecto uses named prepared statements by default and ships with it's own pooling logic.

It seems reasonable for a pooler to be able to support them. Named prepared statements can increase throughput significantly because the query plan is cached.

Unnamed prepared statements do not use a cached query plan because it is deallocated with the next unnamed prepared statement.

If successfully created, a named prepared-statement object lasts till the end of the current session, unless explicitly destroyed. An unnamed prepared statement lasts only until the next Parse statement specifying the unnamed statement as destination is issued. ref

Questions

What if two different clients linked to one pg connection make two different prepared statements with the same name?

Because it's an async protocol can we add metadata to messages that Postgres will include coming back? Then we can use that to route the right messages to the right clients.

Can we use pipelining to multiplex?

Ideas

  • Another pool mode
  • Pin a client connection to a db conn such that each client would always get the same db conn from the pool

There would be some contention here probably because clients would have to wait on a specific conn but it would work. Question is does the contention negate the perf from the prepared statement. With another mode transaction pooling would work as expected but you could use this new mode if it works for your workload and provides perf gains.

Metric endpoint per tenant

We need to expose a metric endpoint per tenant where all the metrics are only for a certain tenant.

  • /metrics/:external_id
  • filter per tenant metrics out of the /metrics endpoint so it stays small
  • prioritize these metrics:
Screenshot 2023-08-07 at 9 37 43 AM

Webhooks for events

Webhooks for certain events:

  • before pool starts - sync and async option (sync should block starting pool until webhook responds)
  • after pool stopped

Configurable `ext_id_location` per tenant

We need to pull the external_id from somewhere. We can pull it from the credentials or the sni_host.

  • first check username if there is a period in it, use string before period to look up tenant
  • if no . in username use sni_host to lookup if connection is SSL

Connection maintenence

  • Monitor the connection for cleanup
  • What does Ranch do when the client closes the connection?
  • If connection goes bad make sure to disconnect from Postgres so it can cleanup

Start pools even if user isn't in user table

  • Use some "default user pool config" with a pool_size
  • pool_type should be based on port maybe - 5432 = session and 6543 = transaction
  • Flag on tenant record for require_user and if require_user is false we allow user creds to pull from connection string or other method
  • require_user field default false
  • Authentication of user to tenant database will happen on pool start and error to client if auth fails
  • tenant username and pass stored in Cachex
  • Docs - overview of this behavior
  • Docs - tell people what happens if they change username/pass in tenant db
  • Docs - example auth user query

Multiple pools per tenant

We have multiple microservices using tenant databases. We don't want one service (e.g. Storage @inian) checking out all database connections during (potentially) long transactions.

You could just put some simple metadata in the tenant external_id. So the external_id would be a hash of the tenant (for Supabase it's the project id) and some other key.

Example: mfrkmguhoejspftfvgdz_storage and then when the Storage service connects it the connection string would be psql postgresql://postgres.mfrkmguhoejspftfvgdz_storage:postgres@localhost:7654/postgres

Or should we have some more official pool_key on the tenant record where the username in the connection string would be external_id <> _ <> pool_key?

############## edit ################

Implementation

@abc3 and I discussed:

  • user table with multiple users per tenant. A user is a Postgres user.
  • schema see below
  • spin up a connection pool per tenant user on connect
  • db_user_alias field will be an alias of the db_user field so that you can create run a transaction and session pool with the same Postgres user at the same time
  • is_manager field will flag a user to be used for eventual database management functionality which only Postgres admins can do when needed

schema

field(:db_host, :string)
field(:db_password, :binary)
field(:db_port, :integer)
field(:db_user, :string)
field(:db_user_alias, :string)
field(:is_manager, :boolean, default: false)
field(:mode_type, :string)
field(:pool_size, :integer)
belongs_to(:tenant, Supavisor.Tenants.Tenant, foreign_key: :tenant_external_id, type: :string)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.