Continual feedback means continual training. No way around it. So you’d have to scope down the functional unit to a fairly small lora in order to get reasonable re-training costs here.
That's not quite true. The system prompt is state that you can use for "training" in a way that fits the problem here. It's not differentiable so you're in slightly alien territory, but it's also more comprehensible than gradient-descending a bunch of weights.
Or maybe figure out a different architecture
Either way, the end user experience would be vastly improved