Item 43688277

Yup GPT 4.1 isn't good at all compared to the others. I tried a bunch of different scenarios, for me the winners:

Deepseek for general chat and research Claude 3.7 for coding Gemini 2.5 Pro experimental for deep research

In terms of price Deepseek is still absolutely fire!

OpenAI is in trouble honestly.

torginus • 4 days ago

One task I do is I feed the models the text of entire books, and ask them various questions about it ('what happened in Chapter 4', 'what did character X do in the book' etc.).

GPT 4.1 is the first model that has provided a human-quality answer to these questions. It seems to be the first model that can follow plotlines, and character motivations accurately.

I'd say since text processing is a very important use case for LLMs, that's quite noteworthy.