Item 42245612

I don't understand what problem you're solving that puppeteer doesn't already solve.

huss97 • 2 months ago

We love Puppeteer and use it to power our API!

We don't compete with Puppeteer/Playwright/Selenium but we're focused on solving the problem of hosting Chromium browsers in the cloud.

The workflow jump we've seen ourselves and from customers is that it's pretty straightforward to build these scripts locally, but the moment you want to run them in prod, you now need to deal with a ton of overhead around packaging up to docker, resource management, scaling, etc. We want to make that trivial :)

4 replies

cranberryturkey • 2 months ago

I have never had any issues running puppeteer on a server, what problems are you solving that would arise?

1 reply

huss97 • 2 months ago

Yeah, that’s totally fair! Not everybody will. In our experience, the problems we solve are the creeping kind that develop over time and show themselves as maintenance costs.

Let’s say we want to add web automation to our app, and we've set up a single instance of Chrome hosted on a server somewhere that we connect to and drive with Puppeteer. Great, now we need to ensure incoming requests/sessions are managed properly (>1 active session). If we have a ton of active requests, this becomes annoying because now the resources on our server are being eaten up — so we would need to scale horizontally (or vertically) but now that comes with potential downtime and cold start issues. If we decide to keep this instance of Chrome running 24/7 then we need to bake in resource management to handle memory leaks and connection issues. If we don’t keep them alive, then this comes with significant cold start times (10s+).

Now, let’s say we want to support a real-time scraping use case that requires multiple browsers in parallel. At this point, we would need to use something like Kubernetes with warm pods or maybe lambdas/cloud run. But even those have their own set of challenges/costs. Managing Kubernetes can get complicated and expensive quickly, especially if you’re not already using it for your app. On the other hand, serverless options like Lambda or Cloud Run introduce latency issues (cold starts) and often don’t provide the flexibility you need for long-running sessions or custom configurations.

Then, beyond the infra, we'll probably need to build capabilities to not get stopped out by anti-bot measures like proxy mgmt, fingerprint rotation, captcha solving, etc.

Pretty quickly, as you grow, a small-scale project can turn into a pretty big maintenance project. Building parsers/agents/scripts is hard enough, so we're hoping to make the infra side as easy as possible.

bleachpedro3 • 2 months ago

So this is the Vercel approach to web bots? Hopefully less predatory aha..

1 reply

huss97 • 2 months ago

haha, I love this analogy! If we can do for deploying web automation to prod what Vercel did for deploying front-ends, I'd be a happy camper. With even friendlier pricing of course :P

meiraleal • 2 months ago

And why run in the cloud in place of my own computer, with my own IP that won't be flagged for being an AWS IP?

peab • 2 months ago

I can relate to this exact experience and I think anybody who's tried to bring scraping to prod can as well!

1 reply

huss97 • 2 months ago

haha love to hear that :) we want to make that experience trivial