Hey HN, we're Alexi and Jonas the co-founders of Autotab (https://autotab.com). Autotab is a chrome-based browser you can teach to do complex tasks, with a simple API for running them from your app or backend.
Here is a walkthrough of how it works: https://youtu.be/63co74JHy1k, and you can try it for free at https://autotab.com by downloading the app.
Why a dedicated editor?
The number one blocker we've found in building more flexible, agentic automations is performance quality BY FAR (https://www.langchain.com/stateofaiagents#barriers-and-chall...). For all the talk of cost, latency, and safety, the fact is most people are still just struggling to get agents to work. The keys to solving reliability are better models, yes, but also intent specification. Even humans don't zero-shot these tasks from a prompt. They need to be shown how to perform them, and then refined with question-asking + feedback over time. It is also quite difficult to formulate complete requirements on the spot from memory.
The editor makes it easy to build the specification up as you step through your workflow, while generating successful task trajectories for the model. This is the only way we've been able to get the reliability we need for production use cases.
But why build a browser?
Autotab started as a Chrome extension (with a Show HN post! https://news.ycombinator.com/item?id=37943931). As we iterated with users, we realized that we needed to focus on creating the control surface for intent specification, and that being stuck in a chrome sidepanel wasn't going to work. We also knew that we needed a level of control for the model that we couldn't get without owning the browser. In Autotab, the browser becomes a canvas on which the user and the model are taking turns showing and explaining the task.
Key features:
1. Self-healing automations that don't break when sites change
2. Dedicated authoring tool that builds memory for the model while defining steps for the automation
3. Control flows and deep configurability to keep automations on track, even when navigating complex reasoning tasks
4. Works with any website (no site-specific APIs needed)
5. Runs securely in the cloud or locally
6. Simple REST API + client libraries for Python, Node
We'd love to get any early feedback from the HN community, ideas for where you'd like the product to go, or experiences in this space. We will be in the comments for the next few hours to respond!
If I understand this correctly, it looks like the promise I saw in that 'Record Macro' button in my Excel toolbar in the 1990s might finally be coming to fruition in a wider and more capable sense! A pleasant surprise effect of the new AI situation if true.
I noticed in another comment that you said some steps can be made 'optional' (e.g. clicking through a modal). In my ancient Excel macro adventure, what I learned was that I had to tweak the heck out of the VBA code that Record button generated, which led to me just straight writing VBA for everything and eventually abandoning the Record feature entirely. I had a similar experience later on with AutoHotKey. What are the analogous aspects of Autotab to this? Also, to what extent is hand-manipulating the underlying automation possible and/or necessary to get optimal results?
Indeed! A little secret: Internally we call the skills/workflows in Autotab macros :)
Currently there is a bit of a learning curve for training Autotab to be really reliable in hard cases. We expect we’ll be able to decrease significantly in the next few months, as we get models to do more of the thinking about how to best codify a given task solution/workflow. As an intuition pump for why we expect such rapid progress: in the scenario you described you’d just have a model write the VBA code for you.
I love the idea - owning the browser definitely seems like the right approach.
I tried it out on a workflow I've been manually piecing together and it gave me a bunch of "Error encountered, contact support" messages when doing things like clicking on a form input field, or even a button.
The more complex "Instruction" block worked correctly instead (literally things like "click the "Sign In" button), but then I ran out of the 5 minutes of free run time when trying to go through the full flow. I expect this kind of thing will be fixed soon, as it grows.
In terms of ultimate utility, what I really want is something which can export scripts that run entirely locally, but falling back to the more dynamic AI enhanced version when an error is encountered. I would want AutoTab to generate the workflow which I could then run on my own hardware in bulk.
Anyway, great work! This is definitely the best implementation I've seen of that glimpsed future of capable AI web browsing agents.
sorry you encountered that issue! what website was the form on? we'll see if we can catch the error!
curious what you mean by generating the workflow that you run on your own hardware? Is this different than running Autotab locally?
Hah, looks like you guys found my account error via my profile email, nice! Thanks for fixing that bug. I'll try again tomorrow when the fix is pushed.
My other request is probably not in line with your business model. I get the sense that Autotab is always communicating with some server on your end, probably for the various bits of AI functionality. What I was asking for is the ability to export the actions/workflow as, say, a python script (like a Selenium script, or even better, a script which drives your browser) which performs the actions in the Autotab workflow.
I need AI understanding when creating the workflow, or healing in case of an error, but I don't always need it when just executing a prepared script. In those (non AI needed) cases, I don't really want to use up my runtime minutes just because I'm executing a previously generated workflow.
Really exciting to see this approach to automation and intent specification! We’ve been working with similar challenges at Origins AI, where we focus on deep tech solutions.
I can’t overstate how much having a robust system for breaking down tasks and iterating on them has helped us.
For one of our recent projects, we had to integrate complex workflows with third-party systems, and it was clear that reliability came down to how well we could define and refine intent over time.
I’m especially curious about your self-healing automations. That’s an area where we’ve found a lot of value using models that can adapt to subtle UI changes, but it’s always a tradeoff with latency. Would love to hear more about how you balance that in production!
Looking forward to trying Autotab and seeing how it compares with some of the internal tools we’ve built!
Agree on the tradeoff between ability to handle novel situations and speed/cost. Autotab uses a “ladder of compute” system that escalates to the minimal level of compute required to solve a given subtask. I wrote a longer comment about this on another thread
Very neat in theory but I'm failing to find any technical details.
Which layer is the automation happening? Inside using Dev tools? Multiple?
What is the self-healing mechanic? I'm guessing invoking an LLM to find what happened and fix it?
I guess what I'm wondering is. Is this some sort of hybrid between computer use and Dev tools usage?
Autotab is definitely a hybrid approach, because when it comes to deciding where on the page to take an action, Autotab has to be fast & cheap (humans are both of those) while also being robust to changes. The solution we use is a "ladder of compute" where Autotab uses everything from really fast heuristics and local models up to the biggest frontier models, depending on how difficult the task is.
For instance, if Autotab is trying to click the "submit" button on a sparse page that looks like previous versions of that page, that click might take a few hundred milliseconds. But if the page is very noisy, and Autotab has to scroll, and the button says "next" on it because the flow has an additional step added to it, Autotab will probably escalate to a bigger model to help it find the right answer with enough certainty to proceed.
There is a certain cutoff in that hierarchy of compute that we decided to call "self-healing" because latency is high enough that we wanted to let users know it might take a bit longer for Autotab to proceed to the next step.
So no computer use (pixel-level understanding).
That's disappointing as the devtools approach always has limitations.
Kura agents, Runner H, and scrapybara will all end up more reliable than you.
If by pixel level you mean vision-first understanding and control of the UI then you’ve misunderstood my comment - Autotab primarily uses vision to reason about screens and take action.
You can also use Anthropic’s Computer Use model directly in Autotab via the instruct feature - our users find it most helpful for handling specific subtasks that are complex to spell out, like picking a date in a calendar.
You say "try it for free" but your website has no pricing information at all. Is this free for just a while? Free forever? What is your monetization strategy?
Can I point it at my own LLM or am I locked into using OpenAI?
We have unlimited free editing, so you can fully try everything out and know your skill will work before we ask you to subscribe. You also get 5m of free runtime. Subscriptions start at $39/month with 300 minutes of runtime included.
Right now we do not let you BYO llm, but it's something we would love to provide an option for where possible!
5 minutes seems like barely enough time to complete any given task, let alone actually try it out. $40/mo for a capped plan seems steep, but maybe I'm not your target customer. Best of luck!
The free edit mode has all of the features of run mode, and lets you fully test the skill. The only difference is that inside of a loop it will ask you to click to continue.
A lot of AI tools promise the world and don't deliver. We explicitly don't want anyone to pay us until they're sure Autotab can do their task, even though the model costs during editing are actually much higher than during runtime.
Good point, will add pricing information to our website ASAP, had skipped that one in the push to launch (it is only available in the app at the moment)
This is awesome! What is your most common use case? Have you thought of competing with https://scribehow.com/ in the documentation space?
Thanks! Our most common use cases are repetitive tasks people have at work, think updating Hubspot with analytics data from an internal tool or reconciling payments between an invoicing system, a payment system and a CRM.
Haven’t done a lot with Scribe-like documentation cases. Given the pace at which this technology is developing we’re focused on making Autotab really good at the most economically valuable tasks.
How on earth does this help with reconciling payments? Can Autotab also recognize "this transaction belongs to this invoice" or does it just copy and paste all transaction and invoice data into a spreadsheet for manual reconciliation?
Yes, Autotab can reason over the state of applications and the data it is seeing. You can also teach it to do certain steps only in specific cases.
If you wanted Autotab to reconcile payments you would teach it to go to wherever the payments are listed eg a banking app. There you would have it iterate through the unreconciled payments. For each payment you’d have Autotab go to the invoicing tool and look up any details from the payment (eg IBAN, information from the reference number, amount, etc) to find the matching customer and invoice. This is where most of the reasoning happens - you can teach Autotab what counts as sufficiently close to be a match with prompts and examples. Then you can have Autotab mark the invoice as paid and go back to the payment app and mark the payment with the invoice number it grabbed from the matched payment.
The functionality looks very very cool. But the privacy policy raises an eyebrow - am I overreacting?
Usage Information. To help us understand how you use our Services and to help us improve them, we automatically receive information about your interactions with our Services, like the pages or other content you view, the searches you conduct, and the dates and times of your visits.
Desktop Activity on our Services. In order to provide the Services, we need to collect recordings of your desktop activity while using our Services, which may include audio and video screen recordings, your cookies, photos, local storage, search history, advertising interactions, and keystrokes.
Information from Cookies and Other Tracking Technologies. We and our third-party partners collect information using cookies, pixel tags, SDKs, or other tracking technologies. Our third-party partners, such as analytics partners, may use these technologies to collect information about your online activities over time and across different services.
[...]
How We Disclose the Information We Collect
Affiliates.We may disclose any information we receive to any current or future affiliates for any of the purposes described in this Privacy Policy.
Vendors and Service Providers. We may disclose any information we receive to vendors and service providers retained in connection with the provision of our Services.
We work with fortune 500 companies and have HIPAA compliant offerings, so we are very sensitive to privacy and security concerns. Fundamentally the models need to operate on whatever browser tasks users ask Autotab to perform, and we need to use frontier vision models like 4o and Claude to reliably perform them (model providers are the affiliates in question). If you have specific concerns happy to answer them.
Your response doesn't seem to address the Privacy concerns raised. Why is the policy so broad and invasive? There's no mention of how you handle PII data collected as telemetry.
Is Autotab able to scrape data from multiple websites with different structures and combine this data into structured data in one CSV or JSON file? Example: scrape interest rates offered on savings accounts from multiple bank websites and extract the name of the bank, bank logo, product name and interest rate for each account and run this saved query on a regular schedule (daily, weekly etc)?
Assuming the bank’s websites look totally different from one another, you’d need open ended exploration to data extraction. We’ve focused more on reliability for repetitive tasks over flexibility for open ended tasks historically, but models are getting good enough that this tradeoff is diminishing. Expect updates from us on this front soon.
You can schedule skills in Autotab to run at arbitrary frequency.
I see it's able to perform data extraction, but what if you wanted to enter in data from another system, or generated by an LLM during the workflow?
Data from external systems can be provided to Autotab in the form of CSV files or string inputs, which can be passed to the API to parametrize skills. However, in most cases, ingesting data into Autotab is easiest by just having Autotab navigate to the website where the data is present.
Autotab has a structured type system underlying the workflows, so any data processed in the course of an automation can be referenced in later steps. It's a bit like a fuzzy programming language for automation, and the model generates schemas to ensure data flows reliably through the series of steps.
For example, users often start by collecting information in one system (using an extract step as you mentioned), then cross reference it in another and then submit some data by having Autotab type it into a third system. In Autotab, you can just type @ to reference a variable, each step has access to data from previous steps.
At the end, you can get a dump of all of Autotab's data from a run as a JSON file, or turn specific arrays of data into CSV files using a table step.
I don’t know what your intention is but I imagine that’s how more and more are going to push LLM slop on all corners of the internet. It’ll be easy to do in massive quantities.
If this was an OSS project automating a specific service many HN-ers would come and bleet about TOS violations & being scared/wary of C&Ds.
How does this not violate TOS? Do you have legal protection set up from megacorps trying to bully you with legal threats?
Automation despite TOS via Adversarial Interop should be a Digital Human Right. Godspeed.
This has been much less of an issue than I would have expected - Autotab is optimized for reasoning heavy tasks in core systems that require high reliability over being really fast at doing giant scrapes. More automating leads in Salesforce, tickets in Jira and data in Airtable than hawking tickets.
Just want to reiterate I fully support what you're doing and I despise the megacorps that send out legal threats to small companies/OSS devs but according to their overbroad TOS they do not make distinctions between the types of automations and reasoning behind them - technically, they would argue, both you and your users are violating TOS. I'm sure you have already, but make sure the legal help at YC give you the ammo you need to protect yourself and your customers when some of them randomly start getting banned.
As more and more AI Agent enabled tooling comes out, this will become a bigger issue (the fact that people are automating these services against the TOS) so it's good if everyone who can get legal help has and shares the tactics to fight back against any civil TOS-based legal threats so we are all protected.
This is awesome. I was just trying to get a rudimentary version of this for some "user" interaction heavy data extraction. Definitely giving it a try.
For a case with lots of requests how does Autotab handle ip-blocking? Does each run use a different portal instance?
When you run Autotab in the app it runs locally, so no IP blocking issues there. If you want to run it in the cloud eg via API, by default your IP will be from the data center but we have residential proxies that we can enable on a case by case basis.
Just tried it - very cool indeed. I did a page loop extraction but it seems to be the same speed when I run it. The elements I am doing the loop on look pretty much the same, just different images. I think it would be great if it was able to generalize how to find an element like with css selectors for example to speed up once its sure that is the data you are looking to extract for a given loop.
Totally agree, making page loop faster is on the top of our list of things to do! There are cases where you need page loop to do quite a bit of reasoning so it will be this slow until models get faster, but we can make it a lot faster today on happy paths - stay tuned :)
> we have residential proxies that we can enable on a case by case basis.
Who is your vendor for residential proxies? That’s quite a sketchy industry.
We use a range of different providers, it really depends on the customer and use case. We only enable the proxy in rare cases that need it for a specific reason.
I don't read docs. Didn't get it to work the way I wanted... It needs simplification.
Have you considered how to handle mobile verification codes, graphic verification codes, and "proving you are not a robot" verification methods?
Quoting my cofounder from another thread:
For 2FA, different users take different approaches. Everything from teaching Autotab to pull auth codes from their email, to setting intervention requests at the top of their skills, to enterprise integrations that we support with SSO and dedicated machine accounts.
Autotab also has the ability to securely sync session data from your local app to cloud instances. This usually removes the need for doing 2FA again for sites with “remember this device” functionality.
We can enable captcha solving for select customers, but don’t allow that in the public app to prevent abuse.
Pretty slick. I recorded a session for ordering from a restaurant website, and it did repeat the entire workflow. It had some issues with a modal popped up but all in all well done! We have been trying to robotify the task of ordering from restaurant for our clients and seems like your solution can work well for us. I am guessing that you want your users to use Autotab browser, what is use for API?
Thanks! We think of the browser as an authoring tool where you create, test and refine skills.
After you've done that, the API is great for cases where you want to incorporate Autotab into a larger data flow or product.
For instance, say Company A has taught Autotab to migrate their customers' data - so their customers just see a sync button in the Company A product, which kicks off a Autotab run via API. Same for restaurant booking, if you'd want that to happen programatically.
Understood! How does it work if we have several different restaurants to order from, do I need to record each ordering session and create skills for each restaurant or it can infer on its own given the task to order from a restaurant. Secondly, any docs or samples to see how to integrate this with your API?
Depends on how different the flows are for different restaurants. If they're just different names but use the same booking system you'd typically use an input and have Autotab find the correct restaurant first. If they're totally different booking systems you can try the instruct (open ended agentic) step but my guess is that will be too slow and unreliable for now, so you'd probably want to record different skills for each.
Docs are here with sample code: https://docs.autotab.com/api-reference
Is the API also charged based on runtime? And I'm assuming that workflow happens in the cloud? What if it's behind a login? What if that login requires 2FA?
Yep exactly. Authentication is primarily handled with session data, so passwords never leave your device, but we also support setting secrets.
Here is more info on auth and security: https://docs.autotab.com/manual/security
For 2FA, different users take different approaches. Everything from teaching Autotab to pull auth codes from their email, to setting intervention requests at the top of their skills, to enterprise integrations that we support with SSO and dedicated machine accounts.
Also for the modal popup - this is the kind of issue that goes away in run mode because Autotab will escalate to bigger models to self-heal.
If the modal pops up frequently you can also record an click to dismiss it and make that click optional so Autotab knows to move on if the modal does not pop up sometimes.
> As it runs, Autotab asks for clarifications and feedback. These learnings are accumulated into action memory—improving Autotab's world model, and allowing it to work reliably for hours on end.
Is "learning", used as a noun, a term of art in this field?
If not, my reactioning to that using is that it is a being bad English that causes producings of gratings on the ears.
It's honestly common industry slang and may be British English.
It's not really that common in British English. I've heard it from colleagues who learnt English in India.
Been working in this space for almost 9 years and written a lot of scrappers and web automations for various clients, I am really excited to build something like this too. Are you guys hiring? Would love to chat.
Warning: they want you to be in the office 6 days a week.
We are hiring. Feel free to reach out at [email protected]
Honestly, the video feels like just any low/nocode tutorial video in a sense “that we’re going to automate something” and a minute later we are copying urls into some complex forms and following the voiceover of something you cannot grasp the meaning of. A little intro of what exactly we are doing would help.
I cold-watched only half of it, without reading any info on the project, but that’s how everyone does it, I guess.
But I get the idea. Automate by example with automatic scenario builder and fuzzy matching ui via ai.
As someone who works in automation, I (again, blindly) suggest looking into anti-detection and human behavior like mouse movements, typing errors and pauses, because that’s what your (and all ours) main enemy will be in the next decade.
All in all, this is in high demand, afaiu. I tend to use a classic ML approach for that (avoiding browser automation cause it obviously only works in a browser and limits/divides the area of application), but would love to try something that self-heals on site changes. Although I think I’d better use something that can detect changes and reconfigure my ML params rather than using it directly, cause I don’t really trust modern AI to free-float in runtime, and also costs.
MacBook Pro m3 max; latest macos version:
Autotab has exited due to multiple fatal errors. Please contact support for assistance: [email protected].
Sorry about that! I don't see any matching errors from 2 hours ago in our logs - if you reach out to the contact@ email address with the email you used in Autotab, I'd be happy to take a closer look
One thing I would recommend. Install instructions for Linux/Windows/Mac. Not finding them in the documentation.
Thanks for the note, we will try to make the install instructions clearer. The desktop app is available via a download button on the homepage: https://autotab.com
Super cool. Congrats & well done. Can I install a Chrome extension within this browser and automate some actions on it?
Thanks! We currently have to manually add Chrome extensions on our side, but plan on supporting users installing arbitrary extensions in the future. So far we’ve found that most apps offer web UIs with the same functionality as the extension and Autotab can just use those.
What extension would you like to automate?
Looks nice. Anybody else in this space? This one is on the pricier end but I’m just a single user so maybe not the target customer
If we are being honest, most of these browser screen scraping startups will be commoditized the moment OpenAI/Anthropic releases their next model. From my experience, having an in-house smaller model working in tandem with the bigger LLMs don't always necessarily produce a better result because in-context learning is just too powerful. The moment OpenAI releases a new model with a better prior, you will see a lot of these companies quietly swapping out their in-house "edge"/specialized fine tuned models. It's like those PDF data extraction companies that have been launching like crazy, 90% will be pivoting if they don't get enough B2B customers locked in. LLMs unfortunately is winner-take-all with the actual model providers cutting out all the middleman.
AskUI could be a solution. It's also not just in browser, but the whole desktop: https://github.com/askui/vision-agent
https://openadapt.ai is open source (MIT license).
Curious, what would you be interested in using Autotab for?
Automating the creation of test orders in our Ecom and ERP tools is one possible use case I can think of, though I’m sure I’d find others in my day to day (possibly around some of the rote tasks I have in Confluence or DevOps)
That sounds like a really good use case! we're constrained by model costs but are interested in offering a lower cost plan – if you email me I'll see what we can do [email protected]
I tried it out on a website I am testing at work but sadly it failed to complete a form :(
what was the website? happy to help figure out your issue, you can also start a chat with us in the app (top left)
'Google SSO'
Urgh. I was excited about this. Anxiously awaiting email/other SSO (we use MS).
Is it possible to get a personal license for testing ??
Hi, do you offer proxies?
Yes, proxies are something we can enable for select customers. If your use case requires them, feel free to reach out at [email protected]
Where are the API docs / client libraries?