lxgr 13 hours ago

There must be some heavy compression/filtering going on, as there's no chance GPT can hold everybody's entire ChatGPT conversation history in its context.

But practically, I believe that Ollama just doesn't have a concept of server-side persistent state at the moment to even do such a thing.

1
codybontecou 13 hours ago

I _think_ the compression used is literally “Chat, compress this array of messages”. This is the technique used in Claude Plays Pokemon.

I’m sure there’s more to the prompt and what to do with this newly generated messages array, but the gist is there.

If this is the case, an Ollama implementation shouldn’t be too difficult.