Autotab is definitely a hybrid approach, because when it comes to deciding where on the page to take an action, Autotab has to be fast & cheap (humans are both of those) while also being robust to changes. The solution we use is a "ladder of compute" where Autotab uses everything from really fast heuristics and local models up to the biggest frontier models, depending on how difficult the task is.
For instance, if Autotab is trying to click the "submit" button on a sparse page that looks like previous versions of that page, that click might take a few hundred milliseconds. But if the page is very noisy, and Autotab has to scroll, and the button says "next" on it because the flow has an additional step added to it, Autotab will probably escalate to a bigger model to help it find the right answer with enough certainty to proceed.
There is a certain cutoff in that hierarchy of compute that we decided to call "self-healing" because latency is high enough that we wanted to let users know it might take a bit longer for Autotab to proceed to the next step.
So no computer use (pixel-level understanding).
That's disappointing as the devtools approach always has limitations.
Kura agents, Runner H, and scrapybara will all end up more reliable than you.
If by pixel level you mean vision-first understanding and control of the UI then you’ve misunderstood my comment - Autotab primarily uses vision to reason about screens and take action.
You can also use Anthropic’s Computer Use model directly in Autotab via the instruct feature - our users find it most helpful for handling specific subtasks that are complex to spell out, like picking a date in a calendar.