What kind of guardrails did you give the agent? Like following SOLID, linting, 100% code coverage, templates, architectural documents before implementing, architectural rules, DRY cleanup cycles, code review guidelines (incl strict rules around consistency), review by another LLM etc?
Not the OP, but in my experience LLMs are still not quite there on guardrails. They might be for 25-50% of sessions, but it’ll vary wildly.