> I'm having trouble understanding how else this is supposed to be? I understand that live migration is a thing, but even in those cases, a VM is "hardwired" to some physical server, no?
You can run your workload (in this case a VM) on top of a scheduler, so if one node goes down the workload is just spun up on another available node.
You will have downtime, but it will be limited.
> so if one goes down ... just spun up on another
On Fly, one can absolutely set this up. Multiple ways: https://fly.io/docs/apps/app-availability / https://archive.md/SJ32K