sterlind 1 day ago

It's not so surprising to me. It's like how Markov chains get better at passing for human the more N-grams they memorize. larger models will continue getting marginally better at predicting the distribution (human language.) but that doesn't translate into improved intelligence.

1
rfoo 1 day ago

The point is, it isn't marginally better. I agree the setup is not a demonstration of intelligence, but the difference is pretty significant. Not to mention that on conventional benchmarks Llama 405B is usually worse than GPT-4o.