Or SetThreadContext() if you want to be hardcore. (not recommended)
Why not recommended? As far as things close to signals go, this is how you implement signals in user land on Windows (along with pause/resume thread). You can even take locks later during the process, as long as you also took them before sending the signal (same exact restrictions as fork actually, but unfortunately atfork hooks are not accessible and often full of fork-unsafe data race and deadlock implementation bugs themselves in my experience with all the popular libc)
I’ve implemented them as you describe, but it’s still a bit hacky due to lots of corner cases — what if your target thread is currently executing in the kernel?
The special APC is nicer because the OS is then aware of what you’re doing— it will perform the user-mode stack changes while transitioning back to user-mode and handle cleanup once the APC queue is drained.