weird-eye-issue 7 days ago

I think this can largely be solved with good UI. For example, if an MCP or tool gets executed that you didn't want to get executed, the UI should provide an easy way to turn it off or to edit the description of that tool to make it more clear when it should be used and should not be used by the agent.

Also, in my experience, there is a huge bump in performance and real-world usage abilities as the context grows. So I definitely don't agree about a negative correlation there, however, in some use cases and with the wrong contexts it certainly can be true.

2
zoogeny 6 days ago

I don't think that could be sufficient to solve the problem.

I'm using Gemini with AI Studio and the size of a 1 million token context window is becoming apparent to me. I have a large conversation, multiple paragraphs of text on each side of the conversation, with only 100k tokens or so. Just scrolling through that conversation is a chore where it becomes easier just to ask the LLM what we were talking about earlier rather than try to find it myself.

So if I have several tools, each of them adding 10k+ context to a query, and all of them reasonable tool requests - I still can't verify that it isn't something "you [I] didn't want to get executed" since that is a vague description of the failure states of tools. I'm not going to read the equivalent of a novel for each and every request.

I say this mostly because I think some level of inspectability would be useful for these larger requests. It just becomes impractical at larger and larger context sizes.

robertlagrant 7 days ago

> For example, if an MCP or tool gets executed that you didn't want to get executed, the UI should provide an easy way to turn it off or to edit the description of that tool to make it more clear when it should be used and should not be used by the agent.

Might this become more simply implemented as multiple individual calls, possibly even to different AI services, chained together with regular application software?

weird-eye-issue 6 days ago

I don't understand your question at all

If you are saying why have autonomous agents at all and not just workflows, then obviously the answer is that it just depends on the use case. Most of the time workflows that are not autonomous are much better, but not always, and sometimes they will also include autonomous parts in those workflows