Does Ollama support the "user context" that higher level LLMs like ChatGPT have?
I'm not clear what they are called (or how implemented) — but perhaps 1) the initial prompt/context (that, for example, Grok has got in trouble with recently) and 2) the kind of saved context that allows ChatGPT to know things about your prompt-history so it can better answer future queries.
(My use of ollama has been pretty bare-bones and I have not seen anything covering these higher level features in -help.)
My understanding is that ollama is more of an "LLM backend", i.e. it provides a server process on your machine that answers requests relatively statelessly.
I believe it keeps the model loaded across sessions, and possibly keeps the KV cache warm for ongoing sessions (but I doubt it, based on the API shape; I don't see a "session" parameter), but that's about it. Nothing seems to be written to disk.
Features like ChatGPT's "memories" or cross-chat context require a persistence layer that's probably best suited for a "frontend". Ollama's API does support passing in requests with history, for example: https://github.com/ollama/ollama/blob/main/docs/api.md#chat-...
Is there more to memory than just an entry into the context/messages array passed to the LLM?
There must be some heavy compression/filtering going on, as there's no chance GPT can hold everybody's entire ChatGPT conversation history in its context.
But practically, I believe that Ollama just doesn't have a concept of server-side persistent state at the moment to even do such a thing.
I _think_ the compression used is literally “Chat, compress this array of messages”. This is the technique used in Claude Plays Pokemon.
I’m sure there’s more to the prompt and what to do with this newly generated messages array, but the gist is there.
If this is the case, an Ollama implementation shouldn’t be too difficult.