Item 43679558

red75prime • 6 days ago

LLMs are doing what you train them to do. See for example " The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions " by Eric Wallace et al.

MattPalmer1086 • 6 days ago

Interesting. Doesn't solve the problem entirely but seems to be a viable strategy to mitigate it somewhat.