> I cannot think of a situation where I could convince my team to allow 100k inserts to be pooled up waiting in memory for a process to crash or a deployment to wipe it out, but having a single buffer is also amazing mistake. io_uring is a refinement of a very old idea of having a read and a write buffer that swap back and forth between two threads or tasks.
Good point! I didn't mention this in the article, but when we utilize this strategy we will either buffer messages consumed from a message queue, or if doing this from a handler the max capacity on the buffer will slow the rate of writes to exert backpressure on the client (an iteration of a strategy employed by tools like RabbitMQ that have built-in flow control) w/ a maximum context timeout.
Also, the buffers don't pool 100k rows at a time -- the 100k rows were single connection benchmarks (before adding buffers). It's important that buffers are small and unnecessary for them to be larger, like I mention later on.
Excited to try out PG 18 with built-in support for async i/o!
> What is your local machine? Max parallelism is often a function of CPU count, so knowing how many core the box has is useful, and might explain needing to do more tests between 20 and 30.
2023 MacBook Pro, Apple M3 Max chip.
There's a great Laurenz Albe article linked in the blog which gives the following formula for connections:
connections < min(num_cores, parallel_io_limit) / (session_busy_ratio \* avg_parallelism)
The article: https://www.cybertec-postgresql.com/en/estimating-connection... >2023 MacBook Pro, Apple M3 Max chip.
Then you likely measure how fast PG updates in-memory buffer rather than actual writes to disk. I cannot find links to discussions where people mentioned that desktop OS and consumer grade SSDs can delay writes to get more performance. This is what ChatGPT has to say about this.
Historically and even in recent versions, macOS has had issues where fsync() does not guarantee data is truly flushed to persistent storage (e.g., SSD or spinning disk). This is due to: • Disk write caching: macOS may acknowledge fsync() even if the data is just in the drive’s volatile cache (not flushed to physical storage). • APFS quirks: The Apple File System (APFS) has been reported to coalesce or delay flushes more aggressively than other filesystems (like ext4 or xfs). • SSD controller behavior: Even if macOS passes the flush through, the SSD itself may lie unless it supports FLUSH_CACHE and has power-loss protection.
So 14 cores vs 20 or 30 connections. What did you get when running that query?
With these queries, `session_busy_ratio` is just going to be 1, so it's most likely going to trivially evaluate to 14 assuming we're CPU bound on writes.
It's slightly more interesting on real data, I just checked against one of our other databases and it evaluates to 0.8, so we'll land at about 18 connections.