That’s hilarious, but at least Llama was trained on libgen, an archive of most books and publications by humanity, no? Except for the ones which were not digitized I guess
So there is probably a big pile of Reddit comments, twitter messages, and libgen and arxiv PDFs I imagine
So there is some shit, but also painstakingly encoded knowledge (ie writing), and yeah it is miraculous that LLMs are right as often as they are
libgen is far from an archive of "most" books and publications, not even close.
The most recent numbers from libgen itself are 2.4 million non-fiction books and 80 million science journal articles. The Atlantic's database published in 2025 has 7.5 million books.[0] The publishing industry estimates that many books are published each year. As of 2010, Google counted over 129 million books[1]. At best an LLM like Llama will have have 20% of all books in its training set.
0. https://www.theatlantic.com/technology/archive/2025/03/libge...
1. https://booksearch.blogspot.com/2010/08/books-of-world-stand...
On libgen.mx they claim to have 33,569,200 books and 84,844,242 articles
Still an order of magnitude short of "all", and falling farther behind every year.
It's a miracle, but it's all thanks to the post-training. When you think of it, for so-called "next token predictors", LLMs talk in a way that almost no one actually talks, with perfect spelling and use of punctuation. The post-training somehow is able to get them to predict something along the lines of what a reasonably intelligent assistant with perfect grammar would say. LLMs are probably smarter than is exposed through their chat interface, since it's unlikely the post-training process is able to get them to impersonate the smartest character they'd be capable of impersonating.
I dunno I actually think say Claude AI SOUNDS smarter than it is, right now
It has a phenomenal recall. I just asked it about "SmartOS", something I knew about, vaguely, in ~2012, and it gave me a pretty darn good answer. On that particular subject, I think it probably gave a better answer than anyone I could e-mail, call, or text right now
It was significantly more informative than wikipedia - https://en.wikipedia.org/wiki/SmartOS
But I still find it easy to stump it and get it to hallucinate, which makes it seem dumb
It is like a person with good manners, and a lot of memory, and which is quite good at comparisons (although you have to verify, which is usually fine)
But I would not say it is "smart" at coming up with new ideas or anything
I do think a key point is that a "text calculator" is doing a lot of work ... i.e. summarization and comparison are extremely useful things. They can accelerate thinking