phamilton 8 days ago

Sincere question: Has anyone figured out how we're going to code review the output of an agent fleet?

5
jsheard 8 days ago

Insincere answer that will probably be attempted sincerely nonetheless: throw even more agents at the problem by having them do code review as well. The solution to problems caused by AI is always more AI.

regularfry 8 days ago

Technically that's known as "LLM-as-judge" and it's all over the literature. The intuition would be that the capability to choose between two candidates doesn't exactly overlap with the ability to generate either one of them from scratch. It's a bit like how (half of) generative adversarial networks work.

brookst 8 days ago

s/AI/tech

sensanaty 7 days ago

Most of the people pushing this want to just sell an MVP and get a big exit before everything collapses, so code review is irrelevant.

lsllc 8 days ago

Simple, just ask an(other) AI! But seriously, different models are better/worse at different tasks, so if you can figure out which model is best at evaluating changes, use that for the review.

phamilton 7 days ago

I suspect this will indeed be part of it, but it won't work with today's AIs on today's codebases.

Models will improve, but also I predict code style and architecture will evolve towards something easier for machine review.

nchmy 7 days ago

sincere question: why would you not be able to code review it in the same way you would for humans?

phamilton 7 days ago

Agents could generate more PRs in a weekend than my team could code review in a month.

Initially we can absolutely just review them like any other PR, but at some point code review will be the bottleneck.

nchmy 3 days ago

Surely humans are the ones initiating the agent though, no? Just do that at a measured pace. And set up comprehensive prompts/mechanisms to make sure the agent satisfies your criteria for tests, style, etc - there's a lot of prompts and tools around the Cline/Roo community for doing stuff like that.

fxtentacle 8 days ago

You just don't. Choose randomly and then try to quickly sell the company. /s