It would be possible to employ an expert doctor, instead of writing a script.
Which is cheaper:
1. having a human expert creating every answer
or
2. having an expert check 10 answers each of which have a 90% chance of being right and then manually redoing the one which was wrong
Now add a complications that:
• option 1 also isn't 100% correct
• nobody knows which things in option 2 are correlated or not and if those are or aren't correlated with human errors so we might be systematically unable to even recognise the errors
• even if we could, humans not only get lazy without practice but also get bored if the work is too easy, so a short-term study in efficiency changes doesn't tell you things like "after 2 years you get mass resignations by the competent doctors, while the incompetent just say 'LGTM' to all the AI answers"