pbmango 5 days ago

I think an under appreciated reality is that all of the large AI labs and OpenAI in particular are fighting multiple market battles at once. This is coming across in both the number of products and the packaging.

1, to win consumer growth they have continued to benefit on hyper viral moments, lately that was was image generation in 4o, which likely was technically possible a long time before launched. 2, for enterprise workloads and large API use, they seem to have focused less lately but the pricing of 4.1 is clearly an answer to Gemini which has been winning on ultra high volume and consistency. 3, for full frontier benchmarks they pushed out 4.5 to stay SOTA and attract the best researchers. 4, on top of all they they had to, and did, quickly answer the reasoning promise and DeepSeek threat with faster and cheaper o models.

They are still winning many of these battles but history highlights how hard multi front warfare is, at least for teams of humans.

2
spiderfarmer 5 days ago

On that note, I want to see benchmarks for which LLM's are best at translating between languages. To me, it's an entire product category.

pbmango 5 days ago

There are probably many more small battles being fought or emerging. I think voice and PDF parsing are growing battles too.

oezi 4 days ago

I would love to see a stackexchange-like site where humans ask questions and we get to vote on the reply by various LLMs.

anotherengineer 4 days ago

is this like what you're thinking of? https://lmarena.ai

oezi 4 days ago

Kind of. But lmarena.ai has no way to see results to questions people asked and it only lets you look at two responses side by side.

kristianp 5 days ago

I agree. 4.1 seems to be a release that addresses shortcomings of 4o in coding compared to Claude 3.7 and Gemini 2.0 and 2.5