Or text to speech generation ... but I guess that is coming.
Yeah, I tried the 4o models and they severely mispronounced common words and read numbers incorrectly (eg reading 16000 as 1600)