crimsoneer 5 days ago

Isn't the problem more that the "needle in a haystack" eval (i said word X once, where) is really not relevant to most long context LLM use cases like code, where you need the context from all the stuff simultaneously rather than identifying a single, quite separate relevant section?

1
omneity 5 days ago

What you're describing as "needle in a haystack" is a necessary requirement for the downstream ability you want. The distinction is really how many "things" the LLM can process in a single shot.

LLMs process tokens sequentially, first in a prefilling stage, where it reads your input, then in the generation stage where it outputs response tokens. The attention mechanism is what allows the LLM as it is ingesting or producing tokens to "notice" that a token it has seen previously (your instruction) is related with a token it is now seeing (the code).

Of course this mechanism has limits (correlated with model size), and if the LLM needs to take the whole input in consideration to answer the question the results wouldn't be too good.