levkk 8 days ago

Hey Jake!

I would love to collaborate. Email me: [email protected]. Config management is a solved problem, we can use K8s or any number of CD tools. PgDog config reloading can be synchronized.

Best effort transactions for schema changes across shards are working today. Ideally, schema changes are idempotent so it's safe to retry in case of failure. Otherwise, we can try 2-phase commit. It'll need a bit of management to make sure they are not left uncommitted (they block vacuum).

Shard splitting can be done with logical replication. I've done this at Instacart with 10TB+ databases. At that scale, you need to snapshot it with a replication slot open, restore to N instances, delete whatever data doesn't match the shard #, and re-sync with logical replication. Another thing I wanted to try was using Pg 17 logical replication from streaming replicas. I feel like it's possible to parallelize resharding with like 16 replicas, without affecting the primary. In that situation, it might be feasible to just COPY tables through foreign tables with postgres_fdw or PgDog (my choice of sharding function was very intentional). Something to consider.

pg_query.rs seems to be mutable now, as far as I can tell. I've been rewriting and generating brand new queries. I like that it's 100% Postgres parser. That's super important. They have the "deparse" method (struct -> SQL) on pretty much every NodeEnum, so I think it's good to go for doing more complex things.

Lev

1
imbradn 7 days ago

Logical replication row filtering is a much better alternative to the ship and drop approach.

levkk 7 days ago

Interesting. I've been thinking about it as well. I wrote some code to prototype it here: https://github.com/pgdogdev/pgdog/blob/main/pgdog/src/backen...

imbradn 7 days ago

We used it to shard a database at Instacart after your time instead of the “replicate everything and drop the data on the shard approach”. That combined with dropping all the indexes (minus the primary key) worked well for the data copy/initial sharding.

levkk 7 days ago

That's good to know. Don't hesitate to reach out if you ever want to chat about this stuff. My email: [email protected]