I had to scroll a very long way to get to the most important bit:
> Running rr record/replay without access to CPU HW performance counters is accomplished using lightweight dynamic (and static) instrumentation. The Software Counters mode rr wiki has more details in case you're curious about some more of the internals.
You should move that to the top.
I have tried SO hard to get rr to work for me, including buying a separate pc just to use it...but it just consistently fails so I've basically abandoned it. Something like this would absolutely be a godsend. Just getting something consistently working with Ubuntu is amazing. Does this approach make working in something like WSL viable?
I would love if this were upstreamed. Is there a github issue where you discuss the possibility of this with the rr devs? That might be something to add to your readme for everyone else who wants to follow along. Thanks!
Thanks for the encouraging words! Please do try it out and report back if it worked well or not for you on the issue tracker.
With sufficient usage I think we can make a good case to get merged upstream. This patch introduces dynamic/static instrumentation for ticks counting which is quite different to how things have happened till now on rr. If there are many success stories a stronger case for upstream merge can be made. The rr maintainers are aware of this project but it is early days yet for an upstream merge PR attempt yet
With a big changeset, its better to have a brief discussion about how it works / what it needs before you actually actually make a PR. Just big principles high level stuff. This way if you build a train station, the devs wont be like "ooh, we really need an airport." Thats why an issue to track it is good: it raises visibility for anyone who has an issue with the approach etc. long before its time to make a merge. Also, if theyre like "well never take this" or "well take this if you build a space station" its good to know that before investing a ton of time into something PR-able.
Very cool. It’s difficult to praise rr too much, and it just keeps getting better. If you’re not using it, you’re missing out on a superpower.
Well! That eliminates one of if not the biggest problems with rr:) Is there some catch or tradeoff? Performance, maybe?
Author here. Yes there are some tradeoffs:
- Performance: slower than using rr with HW counters. Both dynamic and static instrumentation employed by _Software Counter mode_ rr slow things down
- Potential fragility: Dynamic and static instrumentation can often make record/replay a bit more fragile
- Currently only x86-64 support has been publicly released. I have aarch64 support working reasonably well internally and it allows me for instance run to rr in a Linux VM running on macOS ! I have yet to figure out my strategy for the aarch64 release so watch https://github.com/sidkshatriya/rr.soft for any updates.
- Currently can run only on a few recent Linux distributions (e.g. Fedora 40/41, Debian Unstable, Ubuntu 24.10) because it relies on robust debuginfod support that is not widespread yet. See https://github.com/sidkshatriya/rr.soft/wiki#how-does-softwa... for why debuginfod is required. The debuginfod requirement may be relaxed in the future with more work
Regardless of the tradeoffs this allows rr to be used in many more situations i.e. wherever HW Performance counter access is not possible/not reliable/broken.
I would love it if more people tried this out and let me know if things worked out well for them (or not) with their programs.
I'm assuming this changes nothing about the lack of io_uring support?
Yes, io_uring is still not supported due to fundamental issues in the overall rr architecture which my modification does not resolve. My modification only addresses the HW counter requirement of upstream rr and the other core aspects of rr remain the same.
Normal system calls transition to kernel space and return back from kernel space. They will change your program's memory/process state as soon as they complete. This gives rr an easy boundary when it "can do its thing" to record memory/process state changes or insert results (during replay).
When does an io_uring request/response complete ? That's difficult to say. The kernel/userspace when using io_uring communicate with each other by checking a queue head or tail with memory accesses to see if something got added/removed from request/response ring buffer.
Think of io_uring and userspace cooperating via memory. (Yes, sometimes "proper" traditional ring crossing system calls are made but what makes io_uring so fast is communicating via memory and not via system calls most of the time). Anyways all this makes it difficult for rr to intervene on the boundary between kernel and userspace because this boundary is elusive when it comes to io_uring. The memory writes cannot be caught by ptrace ! This explanation is simplified of course.
There are some plans to deal with io_uring by rr maintainers https://github.com/rr-debugger/rr/issues/2613
Will if run on latest NixOS?
Probably not at the moment.
_Software Counter mode_ rr requires robust support for debuginfod in your Linux distribution.
NixOS to my knowledge does not provide debuginfo via a dedicated debuginfod server for all packages so this will make things unreliable for debugging via _Software Counters mode_ rr for packages for which debuginfo is not available via debuginfod.
I recently learned about https://github.com/symphorien/nixseparatedebuginfod for nix (note: I haven't used this so I don't know how reliable or good this is). Anyways, this project requires setting `separateDebuginfo = true;` for the derivation for the debuginfo to be available via debuginfod. This is an opt in approach but we need pervasive support like it exists in Fedora (and some others).
Does it work with Pernosco ?
I've not used Pernosco but I am generally aware of it. My guess is that Pernosco would need to be technically modified to support recordings that have "soft ticks" i.e. use Software Counters.
I don't see any compelling reason why this should _not_ be possible from a broad level technical point of view. Pernosco engineers of course would be able to give a more authoritative reply.
Very nice.
Has anyone got rr working with python?
rr has always worked with Python in the sense that it can record and replay Python programs.
However, when you try to debug the program you can only debug the C code the Python interpreter is written in.
I suppose you want to be able to debug the Python code itself. Here is something that could do this https://pypy.org/posts/2016/07/reverse-debugging-for-python-... . I don't think the project is active nowadays though. Also I haven't used it so can't say whether it is good or not.
It should be possible to built a Python reverse debugger on top of rr. I know this should be possible because I built something for PHP https://github.com/sidkshatriya/dontbug .
There are other fancy (and possibly better) things that are possible -- instead of building a Python debugger atop rr you can record the full trace of the Python program and then for e.g. store the values of important variables at each executed line of the Python program in a database. This would again use rr as the record/replay substrate but with a slightly different approach. This is an area which I've done some work internally but nothing public released yet :-) !
Do the gdb commands for printing information about python frames work with rr? E.g. py-bt, py-print, py-locals?
Any gdb integration scripts for Python to get stack frames etc. should work fine in rr and Software Counters Mode rr.
`rr replay` and (the software counters equivalent) `rr replay -W` invokes gdb.
Patiently waiting for the day when someone makes something similar for macOS.
It's very difficult for a broad based record/replay software like rr to exist for macOS in my opinion. macOS system interfaces are quite basic in terms of functionality compared to Linux and increasingly locked down.
rr uses many advanced features of Linux `ptrace`. Compare `man ptrace` on Linux with that on macOS for example and you will notice that Linux gives a lot of features to `ptrace` that macOS simply does not.
There are a large number of other features required for practical record and replay -- I dont think macOS simply provides them also.
It's probably possible to build _some_ record/replay system on macOS with constraints, restrictions, workarounds and compromises -- never say never as they say. But I don't think it can be as capable/generic as rr on Linux.
Could this help? https://developer.apple.com/documentation/xcode-release-note...
Instruments 16.3 includes a new Processor Trace Instrument which uses hardware-supported, low-overhead CPU execution tracing to accurately reconstruct execution of the program. This tool provides metrics like duration, number of cycles, and instructions retired for every function executed on the CPU. Timeline in Instruments presents execution flame graph, while detail views provide aggregate-level data like Call Tree or aggregated metrics (min, max, count, sum), divided by function. Traces can be recorded using the new Processor Trace template on supported devices: M4 Mac, M4 iPad, and iPhone 16/16 Pro. Tracing on the device requires additional configuration in the System Settings.
Still waiting for rr to work more transparently/easily with Haskell.
rr and Gdb are very DWARF debugging focussed. As long as Haskell has only basic DWARF debugging support I wonder how much rr/gdb can do.
Though I do see a lot of promise in the future. rr can help make premature evaluation (many expressions are evaluated earlier than they might typically happen in a real Haskell program because a user may want to inspect a value) in the debugger not matter so much because that evaluation can be executed in a diversion session.
Is this getting merged back into rr?
Tough to say right now. Here is a long answer: https://github.com/sidkshatriya/rr.soft?tab=readme-ov-file#w...