On Linux you'd do this by sending a signal to the thread you want to analyze, and then the signal handler would take the stack trace and send it back to the watchdog.
The tricky part is ensuring that the signal handler code is async-signal-safe (which pretty much boils down to "ensure you're not acquiring any locks and be careful about reentrant code"), but at least that only has to be verified for a self-contained small function.
Is there anything similar to signals on Windows?
The closest thing is a special APC enqueued via QueueUserAPC2 [1], but that's relatively new functionality in user-mode.
[1] https://learn.microsoft.com/en-us/windows/win32/api/processt...
The 2 implies an older API, its predecessor QueueUserAPC has been around since the XP days.
The older API is less like signals and more like cooperative scheduling in that it waits for the target thread to be in an "alertable" state before it runs (the thread executes a sleep or a wait for something)
> The 2 implies an older API, its predecessor QueueUserAPC has been around since the XP days.
I wasn’t implying that APCs were new, I was implying that the ability to enqueue special (as opposed to normal) APCs from user-mode is new. And of course, that has always been possible from kernel-mode with NT.
Or SetThreadContext() if you want to be hardcore. (not recommended)
Why not recommended? As far as things close to signals go, this is how you implement signals in user land on Windows (along with pause/resume thread). You can even take locks later during the process, as long as you also took them before sending the signal (same exact restrictions as fork actually, but unfortunately atfork hooks are not accessible and often full of fork-unsafe data race and deadlock implementation bugs themselves in my experience with all the popular libc)
I’ve implemented them as you describe, but it’s still a bit hacky due to lots of corner cases — what if your target thread is currently executing in the kernel?
The special APC is nicer because the OS is then aware of what you’re doing— it will perform the user-mode stack changes while transitioning back to user-mode and handle cleanup once the APC queue is drained.