> I'm sure one could solve that more generally, by putting the agent writing the code in a loop with some other code reviewing agent.
This x 100. I get so much better quality code if I have LLMs review each other's code and apply corrections. It is ridiculously effective.
Can you elaborate a little more on your setup? Are you manually copyong and pasting code from one LLM to another, or do you have some automated workflow for this?
I have been doing this with claude code and openai codex and/or cline. One of the three takes the first pass (usually claude code, sometimes codex), then I will have cline / gemini 2.5 do a "code review" and offer suggestions for fixes before it applies them.