2 million bluesky posts if you want to use it for something:
That's a very depressing and unpleasant-looking dataset. I'm don't know if that reflects more on BlueSky, or on the "uniform random sampling" aspect.
If we sampled HN data randomly, would *we* look that bad?
You can try it out here: https://huggingface.co/datasets/nixiesearch/hackernews-comme...
The first comment I see is Spez saying "winner winner chicken dinner"? Haha.