jasonjmcghee 6 days ago

I am excitedly waiting for the first company (guessing / hoping it'll be anthropic) to invest heavily in improvements to caching.

The big ones that come to mind are cheap long term caching, and innovations in compaction, differential stuff - like is there a way to only use the parts of the cached input context we need?

1
manmal 5 days ago

Isn’t a problem there that a cache would be model specific, where the cached items are only valid for exactly the same weights and inference engine? I think those are both heavily iterated on.

simonw 5 days ago

Prompt caches right now only last a few minutes - I believe they involve keeping a bunch of calculations in-memory, hence why for Gemini and Anthropic you get charged an initial fee for using the feature (to populate the cache), but then get a discount on prompts that use that cache.