cranberryturkey 1 day ago

I have never had any issues running puppeteer on a server, what problems are you solving that would arise?

1
huss97 1 day ago

Yeah, that’s totally fair! Not everybody will. In our experience, the problems we solve are the creeping kind that develop over time and show themselves as maintenance costs.

Let’s say we want to add web automation to our app, and we've set up a single instance of Chrome hosted on a server somewhere that we connect to and drive with Puppeteer. Great, now we need to ensure incoming requests/sessions are managed properly (>1 active session). If we have a ton of active requests, this becomes annoying because now the resources on our server are being eaten up — so we would need to scale horizontally (or vertically) but now that comes with potential downtime and cold start issues. If we decide to keep this instance of Chrome running 24/7 then we need to bake in resource management to handle memory leaks and connection issues. If we don’t keep them alive, then this comes with significant cold start times (10s+).

Now, let’s say we want to support a real-time scraping use case that requires multiple browsers in parallel. At this point, we would need to use something like Kubernetes with warm pods or maybe lambdas/cloud run. But even those have their own set of challenges/costs. Managing Kubernetes can get complicated and expensive quickly, especially if you’re not already using it for your app. On the other hand, serverless options like Lambda or Cloud Run introduce latency issues (cold starts) and often don’t provide the flexibility you need for long-running sessions or custom configurations.

Then, beyond the infra, we'll probably need to build capabilities to not get stopped out by anti-bot measures like proxy mgmt, fingerprint rotation, captcha solving, etc.

Pretty quickly, as you grow, a small-scale project can turn into a pretty big maintenance project. Building parsers/agents/scripts is hard enough, so we're hoping to make the infra side as easy as possible.