ben_w 2 days ago

Infectious energy is something I expect faked relatively easily, though I don't know your examples and doing it in AI might be a "first 90%" situation just like self driving cars; for me the problem is that they're fairly mediocre at the actual script — based on me putting a blog post I wrote into one and listening to what came out.

Given how many podcasts exist, I think you need to be at least 2 standard deviations above mean to even get noticed, 3 to be a moderate success, and 4 to be in the charts.

I'd guess AI is "good enough" to be 1 above average, as the NotebookLM voices sound like people speaking clearly and with some joy into decent microphones in sound isolating studios.

1
vunderba 2 days ago

I probably should've clarified that by infectious energy I wasn't so much referring to the vocal aspect as I was the overall quality, interaction between the hosts, and pithiness / wit.

Having experimented with many LLMs (mixtral, sonnet, ChatGPT, Llama, etc.), the coherence is for the most part on point, but their capacity for novelty has been found wanting irrespective of how I tuned the top_k, temperature, or prompts.

That being said, I've seen some very impressive examples of style transference even conveying emotional range in some of the SOTA TTS systems.