jbellis 1 day ago

Yes!

Han Xiao at Jina wrote a great article that goes into a lot more detail on how to turn this into a production quality agentic search: https://jina.ai/news/a-practical-guide-to-implementing-deeps...

This is the same principle that we use at Brokk for Search and for Architect. (https://brokk.ai/)

The biggest caveat: some models just suck at tool calling, even "smart" models like o3. I only really recommend Gemini Pro 2.5 for Architect (smart + good tool calls); Search doesn't require as high a degree of intelligence and lots of models work (Sonnet 3.7, gpt-4.1, Grok 3 are all fine).

2
sagarpatil 18 hours ago

“Claude Code, better than Sourcegraph, better than Augment Code.”

That’s a pretty bold claim, how come you are not at the top of this list then? https://www.swebench.com/

“Use frontier models like o3, Gemini Pro 2.5, Sonnet 3.7” Is this unlimited usage? Or number of messages/tokens?

Why do you need a separate desktop app? Why not CLI or VS Code extension.

crawshaw 1 day ago

I'm curious about your experiences with Gemini Pro 2.5 tool calling. I have tried using it in agent loops (in fact, sketch has some rudimentary support I added), and compared with the Anthropic models I have had to actively reprompt Gemini regularly to make tool calls. Do you consider it equivalent to Sonnet 3.7? Has it required some prompt engineering?

jbellis 1 day ago

Confession time: litellm still doesn't support parallel tool calls with Gemini models [https://github.com/BerriAI/litellm/issues/9686] so we wrote our own "parallel tool calls" on top of Structured Output. It did take a few iterations on the prompt design but all of it was "yeah I can see why that was ambiguous" kinds of things, no real complaints.

GP2.5 does have a different flavor than S3.7 but it's hard to say that one is better or worse than the other [edit: at tool calling -- GP2.5 is definitely smarter in general]. GP2.5 is I would say a bit more aggressive at doing "speculative" tool execution in parallel with the architect, e.g. spawning multiple search agent calls at the same time, which for Brokk is generally a good thing but I could see use cases where you'd want to dial that back.