I do exactly this with hoarder. I passively build tagged knowledge bases with the archived pages and then feed it to a RAG setup.
Cool. Hoarder looks interesting, thanks for the tip. How is it working out for you? Are you using the feature for auto hoarding RSS feeds?
I am! It works great and it’s reasonably easy to snapshot sites without RSS on a cron.