Item 43673946

denidoman • 8 days ago

The current challenge is not to create a patch, but to verify it.

Testing a fix in a big application is a very complex task. First of all, you have to reproduce the issue, to verify steps (or create them, because many issues don't contain clear description). Then you should switch to the fixed version and make sure that the issue doesn't exists. Finally, you should apply little exploratory testing to make sure that the fix doesn't corrupted neighbour logic (deep application knowledge required to perform it).

To perform these steps you have to deploy staging with the original/fixed versions or run everything locally and do pre-setup (create users, entities, etc. to achieve the corrupted state).

This is very challenging area for the current agents. Now they just can't do these steps - their mental models just not ready for a such level of integration into the app and infra. And creation of 3/5/10/100 unverified pull requests just slow down software development process.

ghurtado • 7 days ago

All the things you describe are already being done by any team with a modern CI/CD workflow, and none of it requires AI.

At my last job, all of those steps were automated and required exactly zero human input.

2 replies

denidoman • 7 days ago

Are you sure about "all"? Because I mentioned not only env deployment, but also functional issue reproduction using UI/API, which is also require necessary pre-setup.

Automated tests partially solve the case, but in real world no one writes tests blindly. It's always manual work, and when the failing trajectory is clear - the test is written.

Theoretically agent can interact with UI or API. But it requires deep project understanding, gathered from code, documentation, git history, tickets, slack. And obtaining this context, building an easily accessible knowledge base and puring only necessary parts into the agent context - is still a not solved task.

stitched2gethr • 7 days ago

If your CI/CD process was able to fully verify a fix then it would have stopped the bug from making it to production the first time around and the Jira ticket which was handed to multiple LLMs never would have existed.

gandalfgeek • 8 days ago

There is no fundamental blocker to agents doing all those things. Mostly a matter of constructing the right tools and grounding, which can be fair amount of up-front work. Arming LLMs with the right tools and documentation got us this far. There’s no reason to believe that path is exhausted.

5 replies

denidoman • 7 days ago

Look at this 18 years old Django ticket: https://code.djangoproject.com/ticket/4140

It was impossible to fix, but it required some experiments and deep research about very specific behaviors.

Or this ticket: https://code.djangoproject.com/ticket/35289

Author proposed one-line solution, but the following discussion includes analysis of RFC, potential negative outcomes, different ways to fix it.

And without deep understanding of the project - it's not clear how to fix it properly, without damage to backward compatibility and neighbor functionality.

Also such a fix must be properly tested manually, because even well designed autotests are not 100% match the actual flow.

You can explore other open and closed issues and corresponding discussions. And this is the complexity level of real software, not pet projects or simple apps.

I guess that existing attention mechanism is the fundamental blocker, because it barely able to process all the context required for a fix.

And feature requests a much, much more complex.

dimitri-vs • 7 days ago

Have you tried building agents? They will go from PhD level smart to making mistakes a middle schooler would find obvious, even on models like gemini-2.5 and o1-pro. It's almost like building a sandcastle where once you get a prompt working you become afraid to make any changes because something else will break.

1 reply

sdesol • 7 days ago

> Have you tried building agents?

I think the issue right now is so many people want to believe in the moonshot and are investing heavily in it, when the reality is we should be focusing on the home runs. LLMs are a game changer, but there is still A LOT of tooling that can be created to make it easier to integrate humans in the loop.

tough • 7 days ago

you can just even tell cursor to use any cli tools you use normally in your development, like git, gh, railway, vercel, node debugging, etc etc

1 reply

denidoman • 7 days ago

Tools is not the problem. Knowledge is.

1 reply

AdieuToLogic • 7 days ago

> Tools is[sic] not the problem. Knowledge is.

This is the most difficult concept to convey, expressed in a succinct manner rarely found.

bavell • 7 days ago

The fundamental blocker: Context

ghuntley • 7 days ago

Correct! Over at https://ghuntley.com/mcp I propose that each company develops their own tools for their particular codebase that shapes LLM actions on how to work with their codebase.