tekacs 8 days ago

Over the last two days, I've built out support for autonomy in Aider (a lot like Claude Code) that hybridizes with the rest of the app:

https://github.com/Aider-AI/aider/pull/3781

Edit: In case anyone wants to try it, I uploaded it to PyPI as `navigator-mode`, until (and if!) the PR is accepted. By I, I mean that it uploaded itself. You can see the session where it did that here: https://asciinema.org/a/9JtT7DKIRrtpylhUts0lr3EfY

Edit 2: And as a Show HN, too: https://news.ycombinator.com/item?id=43674180

and, because Aider's already an amazing platform without the autonomy, it's very easy to use the rest of Aider's options, like using `/ask` first, using `/code` or `/architect` for specific tasks [1], but if you start in `/navigator` mode (which I built, here), you can just... ask for a particular task to be done and... wait and it'll often 'just get done'.

It's... decidedly expensive to run an LLM this way right now (Gemini 2.5 Pro is your best bet), but if it's $N today, I don't doubt that it'll be $0.N by next year.

I don't mean to speak in meaningless hype, but I think that a lot of folks who are speaking to LLMs' 'inability' to do things are also spending relatively cautiously on them, when tomorrow's capabilities are often here, just pricey.

I'm definitely still intervening as it goes (as in the Devin demos, say), but I'm also having LLMs relatively autonomously build out large swathes of functionality, the kind that I would put off or avoid without them. I wouldn't call it a programmer-replacement any time soon (it feels far from that), but I'm solo finishing architectures now that I know how to build, but where delegating them to a team of senior devs would've resulted in chaos.

[1]: also for anyone who hasn't tried it and doesn't like TUI, do note that Aider has a web mode and a 'watch mode', where you can use your normal editor and if you leave a comment like '# make this darker ai!', Aider will step in and apply the change. This is even fancier with navigator/autonomy.

3
nico 8 days ago

> It's... decidedly expensive to run an LLM this way right now

Does it work ok with local models? Something like the quantized deepseeks, gemma3 or llamas?

tekacs 8 days ago

It does for me, yes -- models seem to be pretty capable of adhering to the tool call format, which is really all that they 'need' in order to do a good job.

I'm still tweaking the prompts (and I've introduced a new, tool-call based edit format as a primary replacement to Aider's usual SEARCH/REPLACE, which is both easier and harder for LLMs to use - but it allows them to better express e.g. 'change the name of this function').

So... if you have any trouble with it, I would adjust the prompts (in `navigator_prompts.py` and `navigator_legacy_prompts.py` for non-tool-based editing). In particular when I adopted more 'terseness and proactively stop' prompting, weaker LLMs started stopping prematurely more often. It's helpful for powerful thinking models (like Sonnet and Gemini 2.5 Pro), but for smaller models I might need to provide an extra set of prompts that let them roam more.

cyanydeez 6 days ago

So I understand how these prompts work for tooling, etc, but they tend to be specific to specific models. Is it possible you could actually supply say 10 prompts for the same tool and determine which one gets the correct output? It wouldn't be much harder than having some test cases and running each prompt through the user selected model to see which worked.

Otherwise you're at the mercy of whatever model the user has selected or downloaded or whatever. And whenever you need to tweak it to improve something.

This would be akin to how we used to calibrate stylus or touch screens.

regularfry 8 days ago

Since you've got the aider hack session going...

One thing I've had in the back of my brain for a few days is the idea of LLM-as-a-judge over a multi-armed bandit, testing out local models. Locally, if you aren't too fussy about how long things take, you can spend all the tokens you want. Running head-to-head comparisons is slow, but with a MAB you're not doing so for every request. Nine times out of ten it's the normal request cycle. You could imagine having new models get mixed in as and when they become available, able to take over if they're genuinely better, entirely behind the scenes. You don't need to manually evaluate them at that point.

I don't know how well that gels with aider's modes; it feels like you want to be able to specify a judge model but then have it control the other models itself. I don't know if that's better within aider itself (so it's got access to the added files to judge a candidate solution against, and can directly see the evaluation) or as an API layer between aider and the vllm/ollama/llama-server/whatever service, with the complication of needing to feed scores out of aider to stoke the MAB.

You could extend the idea to generating and comparing system prompts. That might be worthwhile but it feels more like tinkering at the edges.

Does any of that sound feasible?

tekacs 7 days ago

It's funny you say this! I was adding a tool just earlier (that I haven't yet pushed) that allows the model to... switch model.

Aider can also have multiple models active at any time (the architect, editor and weak model is the standard set) and use them for different aspects. I could definitely imagine switching one model whilst leaving another active.

So yes, this definitely seems feasible.

Aider had a fairly coherent answer to this question, I think: https://gist.github.com/tekacs/75a0e3604bc10ea88f9df9a909b5d...

This was navigator mode + Gemini 2.5 Pro's attempt at implementing it, based only on pasting in your comment:

https://asciinema.org/a/EKhno9vQlqk9VkYizIxsY8mIr

https://github.com/tekacs/aider/commit/6b8b76375a9b43f9db785...

I think it did a fairly good job! It took just a couple of minutes and it effectively just switches the main model based on recent input, but I don’t doubt that this could become really robust if I had poked or prompted it further with preferences, ideas, beliefs and pushback! I imagine that you could very quickly get it there if you wished.

It's definitely not showing off the most here, because it's almost all direct-coding, very similar to ordinary Aider. :)

gandalfgeek 8 days ago

Very cool. Even cooler to see it upload itself!!