As long as one has to double-check and verify every single output, I don’t think that “automation” is the right word. Every LLM use is effectively a one-off and cannot be repeated blindly.
Undefined behavior as a service is truly a bizarre proposition to my ears. Layering undefined behavior (agents) and gaming undefined behavior in hopes it comes out as you need (prompting) sounds insane and sometimes I have to wonder if I am the insane one. Very weird times.