Item 42198764

amluto • 7 days ago

This would involve the debugger actually being structured in a way that makes this make sense. A debugger like GCC has a gnarly data structure that represents the machine state, and it contains things like EIP/RIP. There is a command 'backtrace' that takes the machine state and attempts to generate a backtrace. And there's a command 'continue' that resumes execution.

int3 is a "trap". continue will resume execution at the instruction after int3, as intended. But backtrace should, by some ill-defined magic, generate the backtrace as though RIP was (saved RIP - 1). And the condition for doing this isn't something that is (AFAIK) representable at all in GCC's worldview. Sure, GCC knows, or at least ought to know [0], that it gained control because of vector 3, and the Intel and AMD manuals say that vector 3 is a trap. But there isn't a bit in memory or anything you would see in 'info regs' that will say "hey, this is a 'trap', and backtraces and such should be done as though RIP was actually RIP-1".

Maybe the right solution would be to split the program counter, from the perspective of the debugger, into two fields: program counter for backtracing, and program counter for resumption.

And yes, I know that GCC gets this wrong. Been there, seen the failures. I have not checked, but I expect that LLDB works exactly like GCC in this regard.

[0] ptrace on Linux exposes the vector number, somewhat awkwardly. Or you can infer it from the fact that the signal was SIGTRAP.

rep_lodsb • 7 days ago

I assume you meant GDB, not GCC, right?

Seems like a deficiency in GDB (and maybe LLDB too), not in the kernel or x86.

1 reply

amluto • 7 days ago

I do mean GCC. Whoops.

Deficiency or not, it breaks debugging. I’m willing to pay a cost of one byte per breakpoint as a workaround.

And GDB has far more outrageous, if less-frequently hit, bugs in its architectural state handling. I’m not holding my breath for a fix.