Item 44127676

szjanikowski • 4 days ago

Oh that's more clear right now. Hints of such refactorings are certainly within reach of todays AI tools (if you agree to send your code to the LLM models). Have you tried asking Cursor/Windsurf this question with a prompt similar to what you've just written above?

BTW it might be an interesting feature for Noesis if it needs to be done during regular scans. Thanks for a tip ;)

mushufasa • 4 days ago

Yes I've tried cursor. Currently it gives 1) high level suggestions if I ask about architecture, which may be valid but doesn't solve the issue of refactoring a large existing codebase to make architectural changes, or 2) some specific improvements on very simple functions, but it majorly falls short for 3) actually implementing improvements, because it doesn't have the context of the product and what "makes sense" as tradeoffs and choices. There are a lot of times where for us, "correctness" is a state of data calculations rather than code validity, where unit tests / integration tests don't exist and aren't trivial to generate.It is counterproductive if we make something run faster but return the wrong results. Or if a team member were to look at a task/function, they could reason "actually this feature that does X should actually be doing Y" but that isn't something the AI can reason about in practice. In those cases, it would be ideal to change the function without relying on tests, because you would actually want the behavior to change. Small example: a feature is not performant, and rather than just making that feature perform better, the better solution would be to switch to a different library that we added elsewhere in the codebase for accomplishing that work.

Also, while cursor is now able to scan terminal server logs to see errors, it doesn't come "out of the box" hooked up to app performance profiling tools -- even just running locally. There probably are some MCP servers or something to do that but I haven't set that up. Really you would want the IDE agent to have a feedback loop that goes like "optimize {speed, resource usage} subject to the constraint of {unit/integration test}" and let it run asynchronously or overnight etc. (Of course, there are tons of times that LLMs will work themselves into a dead end loop, and it would be bad to indefinitely generate LLM API calls on a dead end overnight).