Incredibly, no videos linked in an article about a video newscast. I think this is an example. The AI doesn't even pronounce "AI" correctly. Interestingly, it looks slightly offscreen just the way real newsreaders do when they're on prompter.
https://www.youtube.com/watch?v=Aa7Q2S7VWUk
They do a lot right. There's interaction between the bots. They look kind of professional but not Los Angeles/New York quality, which is what you'd expect from a smallish market. Their movement is also kind of stiff and amateurish, which I believe is intentional.
Newscast teleprompters are directly in front of the camera lens specifically to not have them looking away from the lens. This has been a solved technology for decades. Perhaps you're thinking of cue cards or the teleprompters speakers use in a speech live audience type of setting?
Well you got me. I haven't watched broadcast TV for decades. I do see that phenomenon a lot with vloggers at present.
Am I also incorrect that they appear not to be looking directly at the camera? Looking back after your comment I still think it feels like they aren't.
The AI characters? Nothing about them feels right. The audio looks out of sync with the fake lip flaps. The dude's arm gestures are horrendous. It's AI/cgi, yet the fake background looks like a bad chromakey. You already pointed out some of the audio/voice issues.
> It's AI/cgi, yet the fake background looks like a bad chromakey.
I genuinely wonder if that's intentional. Maybe that looks more "realistic", and gives the audience something to stumble over that's not other AI artefacts?
The result to me looked more like a Zoom background replacement rather than a weather chromakey. That's what really looked bad to me. Even the full studio chromakey looks much better where the anchors are at a desk in front of a color vs a full studio.
I could only find this video. James' arms go up and down in an alarming manner. Rose has more natural movements but the voice you hear when her mouth moves is worse than the worst foreign film voice-over. Somehow the person and the voice mismatched in "tone" in a way that's hard to describe.
Love this so much, not in the way intended. Its just so strange! I can't put my finger on it, but feels like something Tim and Eric, or Tim Robinson, or even Alan Resnick would have a hand in.
There is a kind of aesthetic immanence to whole thing, everything is right on the surface. The voices are only just embodied "enough," their unearned confidence, their "affectations." The deadpan delivery on an absurd stage. The colors all feel like a cake that is too sweet. Like approximating a memory of a broadcast.
It is hilarious and beautiful. No notes.
Yeah I can see all of it, but the problem with me is that I bet I would have watched it a few seconds and clicked off out of boredom, never suspecting they were AI. I really want to claim I would have figured it out instantly, but I can't. If I were a regular consumer I think I'd notice.
They mention right up front they're "powered by AI" but to me that implies they had help with article writing. I would not immediately assume from that statement that the actual newsreaders themselves were AI.
I was surprised at how game the AI was to pronounce the Hawaiian place names, it was confident enough that I assumed the pronunciation was correct. The article notes that it is butchering the placenames though.
To me this illustrates a common cognitive mismatch when evaluating AI, it can be confident in a way that most humans can't, and that misleading social cue is another reason we trust its output.
I've seen plenty of human newsreaders be confidently incorrect about place names. And some pronunciations aren't necessarily "wrong" so much as contested.
The first thing I thought of when I saw this is that some mid-tier dictatorships could replace a lot of their newscasters with this approach. Can always guarantee they’ll say what they need to say, and a lack of emotion is a plus maybe? Except with the dear leader passes then you bring out a real person for the emotions.
Well if the article is to be believed, they actually can't guarantee they'll say what they need to say, but I think your larger point holds.
Looks like they're using something like motion matching to recover fragments of the presenter's motion that match the pronounced phonemes. The actors were probably instructed to avoid almost all movement to make sure it was blendable. That would explain why the guy's hand have such erratic and non-natural movement.
Watching this, I'm left wondering why my brain doesn't want to blend the visual and the audio. I don't think it's the bad lip sync. I have this weird feeling that these persons, were they real, wouldn't have these voices. But I can't quite put my finger as to why. I haven't watched movies dubs in a while, but maybe that's the same kind of phenomenon that makes dubs sound bad. Or maybe we grow an intuitive sense of what a person's voice might sound like based on the appearance of their face's bone structure and muscles?
James' lips don't seem to move at all.
The problem with such "videocasts" (as opposed to "podcasts") is that there is another channel that the AI has to control: the video. Generating convincing video is much harder than generating convincing audio.
The male host’s hands are literally on a loop, it is disturbing. And the female host had several nonsensical sentence fragments. The script isn’t even up to par with what you would see in a college news show.
The way the mouths move are so far off from the words they're speaking that my first impressions would be they're just playing a video loop of these people talking about other random things and dubbing over it.
They could’ve definitely fixed the audio by making it sound like the avatars were wearing mics.
Honestly the whole thing is so off-putting and lazy.
What problem are they trying to solve though?
Paying human presenters.
Human presenters aren't too expensive and are quite flexible, are easily replaced and can make or break a show. Yeah, there's the novelty factor now but am not sure how long it'll take until GenAI on broadcasts will signal second rate, subpar knockoff.
>Human presenters aren't too expensive and are quite flexible, are easily replaced and can make or break a show.
well, if you owned 257 local media outlets across an area five thousand miles wide, you might have the experience and the insight to see it differently. you might identify an opportunity for innovative advancements in efficiency
Maybe we can increase the efficiency by having bots watch the automatic news, too, that way they can vote so we don't have to. With enough innovation we may live to see the first automated senator!
But then wonder why people don’t stick around…
Unless they have some info channels which ultimately don’t need any presenters in the first place.
Thanks for the primary source. Concur the quality is poor. Google’s NotebookLM podcast summary is way more natural sounding.
The video was even worse than the audio. The lip sync was off. The girl looked like someone else’s mouth was mapped onto her face.
The guy locks like a deceised used as a marionette and the girl speaks like a tenor.
But I guess it will become better. TV will turn so soul less, even when compared to today.
Imagine Rakuten Dog Does Funny Stuff channel with this added as some filler. Dystopic.
"James began his tenure as lead anchor, at which point he was unable to blink and his hands were constantly vibrating. He was demoted to second anchor in mid-October, where he began blinking more regularly and his odd hand vibration was replaced by a single emphatic gesture."
>> The AI doesn't even pronounce "AI" correctly.
You can call me Al.
Not great, but it's surprisingly good if you can make this with just a text prompt.
Ooh wow I hate this. Totally soulless appearance and delivery - and the robot fidgeting the dude is doing with his hands completely distracts from everything else. It’s totally normal to do that movement while speaking for emphasis - but whatever he’s doing does not look normal. (The mouths look nightmarish as well)