At the same time we're at a point where synthesizing your voice is getting more trivial everyday, you need only a few seconds of it and you can be made to say whatever someone wants.
Sure, but that doesn’t mean they learn everything I said: Passwords, personal details etc.
Also, getting a voice sample in the first place gets significantly easier that way: Not everybody publishes video or audio recordings of themselves online.
> Passwords
Which reminds me, to strengthen your point, it doesn't have 100% keystroke recognition, but there are works[1] on keylogging via audio, and 93% via Zoom-quality audio streams is concerning enough for me.