Item 43668005

pixl97 • 8 days ago

Eh, I don't really see that.

Crawling the web has a huge moat because a huge number of sites have blocked 'abusive' crawlers except Google and possibly Bing.

For example just try to crawl sites like Reddit and see how long before you're blocked and get a "please pay us for our data" message.

literalAardvark • 8 days ago

My experience running a few hundred very successful shops (hundreds of thousands of orders per month) is that there's no need for quotes around 'abusive'.

95% of our load is from crawlers, so we have to pick who to serve.

If they want our data all they need to do is offer a way for us to send it, we're happy to increase exposure and shopping aggregation site updates are our second highest priority task after price and availability updates.

vidarh • 7 days ago

It may be tricky, but it's a piece of cake compared to doing good retrieval.