> So I do a “coding review” session. And the horror ensues.
Yup. I've spoken about this on here before. I was a Cursor user for a few months. Whatever efficiency gains I "achieved" were instantly erased in review, as we uncovered all the subtle and not-so-subtle bugs it produced.
Went back to vanilla VSCode and still use copilot but only when I prompt it to do something specific (scaffold a test, write a migration with these columns, etc).
Cursor's tab complete feels like magic at first, but the shine wore off for me.
> Cursor's tab complete feels like magic at first, but the shine wore off for me.
My favorite thing here watching a co-worker is when Cursor tries to tab complete what he just removed, and sometimes he does it by reflex.
What kind of guardrails did you give the agent? Like following SOLID, linting, 100% code coverage, templates, architectural documents before implementing, architectural rules, DRY cleanup cycles, code review guidelines (incl strict rules around consistency), review by another LLM etc?
Not the OP, but in my experience LLMs are still not quite there on guardrails. They might be for 25-50% of sessions, but it’ll vary wildly.