Maybe that's true for absolute arm-chair-engineering outsiders (like me) but these models are in training for months, training data is probably being prepared year(s) in advance. These models have a knowledge cut-off in 2024 - so they have been in training for a while. There's no way sama did not have a good idea that this non-COT was in the pipeline 2 months ago. It was probably finished training then and undergoing evals.
Maybe
1. he's just doing his job and hyping OpenAI's competitive advantages (afair most of the competition didn't have decent COT models in Feb), or
2. something changed and they're releasing models now that they didn't intend to release 2 months ago (maybe because a model they did intend to release is not ready and won't be for a while), or
3. COT is not really as advantageous as it was deemed to be 2+ months ago and/or computationally too expensive.
With new hardware from Nvidia announced coming out, those months turn into weeks.
I doubt it's going to be weeks, the months were already turning into years despite Nvidia's previous advances.
(Not to say that it takes openai years to train a new model, just that the timeline between major GPT releases seems to double... be it for data gathering, training, taking breaks between training generations, ... - either way, model training seems to get harder not easier).
GPT Model | Release Date | Months Passed Between Former Model
GPT-1 | 11.06.2018
GPT-2 | 14.02.2019 | 8.16
GPT-3 | 28.05.2020 | 15.43
GPT-4 | 14.03.2023 | 33.55
[1]https://www.lesswrong.com/posts/BWMKzBunEhMGfpEgo/when-will-...
The capabilities and general utility of the models are increasing on an entirely different trajectory than model names - the information you posted is 99% dependent on internal OAI processes and market activities as opposed to anything to do with AI.
I'm talking more broadly, as well, including consideration of audio, video, and image modalities, general robotics models, and the momentum behind applying some of these architectures to novel domains. Protocols like MCP and automation tooling are rapidly improving, with media production and IT work rapidly being automated wherever possible. When you throw in the chemistry and materials science advances, protein modeling, etc - we have enormously powerful AI with insufficient compute and expertise to apply it to everything we might want to. We have research being done on alternate architectures, and optimization being done on transformers that are rapidly reducing the cost/performance ratio. There are models that you can run on phones that would have been considered AGI 10 years ago, and there doesn't seem to be any fundamental principle decreasing the rate of improvement yet. If alternate architectures like RWKV get funded, there might be several orders of magnitude improvement with relatively little disruption to production model behaviors, but other architectures like text diffusion could obsolete a lot of the ecosystem being built up around LLMs right now.
There are a million little considerations pumping transformer LLMs right now because they work and there's every reason to expect them to continue improving in performance and value for at least a decade. There aren't enough researchers and there's not enough compute to saturate the industry.
Fair point, I guess my question is how long it would take them to train GPT-2 on the absolute bleedingest generation of Nvidia chips vs what they had in 2019, with the budget they have to blow on Nvidia supercomputers today.