amluto 7 days ago

Plain int3 is a footgun: the CPU does not keep track of the address of the int3 (at least not until FRED), and it reports the address after int3. It’s impossible to reliably undo that in software, and most debuggers don’t even try, and the result is a failure to identify the location of the breakpoint. It’s problematic if the int3 is the last instruction in a basic block, and even worse if the optimizer thinks that whatever is after the int3 is unreachable.

If Rust’s standard library does this, please consider using int3;nop instead.

2
JoshTriplett 7 days ago

Good to know! I've seen the pattern of "int3; nop" before, but I've never seen the explanation for why. I'd always assumed it involved the desire to be able to live-patch a different instruction over it.

In Rust, we're using the `llvm.debugtrap` intrinsic. Does that DTRT?

rep_lodsb 7 days ago

The "canonical" INT 3 is a single byte opcode (CCh), so the debugger can just subtract 1 from the address pushed on the stack to get the breakpoint location.

There is another encoding (CD 03), but no assembler should emit it. It used to be possible for adversarial code to confuse debug interrupt handlers with this, but this should be fixed now.

amluto 7 days ago

This would involve the debugger actually being structured in a way that makes this make sense. A debugger like GCC has a gnarly data structure that represents the machine state, and it contains things like EIP/RIP. There is a command 'backtrace' that takes the machine state and attempts to generate a backtrace. And there's a command 'continue' that resumes execution.

int3 is a "trap". continue will resume execution at the instruction after int3, as intended. But backtrace should, by some ill-defined magic, generate the backtrace as though RIP was (saved RIP - 1). And the condition for doing this isn't something that is (AFAIK) representable at all in GCC's worldview. Sure, GCC knows, or at least ought to know [0], that it gained control because of vector 3, and the Intel and AMD manuals say that vector 3 is a trap. But there isn't a bit in memory or anything you would see in 'info regs' that will say "hey, this is a 'trap', and backtraces and such should be done as though RIP was actually RIP-1".

Maybe the right solution would be to split the program counter, from the perspective of the debugger, into two fields: program counter for backtracing, and program counter for resumption.

And yes, I know that GCC gets this wrong. Been there, seen the failures. I have not checked, but I expect that LLDB works exactly like GCC in this regard.

[0] ptrace on Linux exposes the vector number, somewhat awkwardly. Or you can infer it from the fact that the signal was SIGTRAP.

rep_lodsb 7 days ago

I assume you meant GDB, not GCC, right?

Seems like a deficiency in GDB (and maybe LLDB too), not in the kernel or x86.

amluto 7 days ago

I do mean GCC. Whoops.

Deficiency or not, it breaks debugging. I’m willing to pay a cost of one byte per breakpoint as a workaround.

And GDB has far more outrageous, if less-frequently hit, bugs in its architectural state handling. I’m not holding my breath for a fix.