github.com

Imagine creating a podcast where Mark Zuckerberg interviews Elon Musk – using their actual voices?

What sounds like science fiction is now reality.

Voice-Pro is an open-source Gradio WebUI that breaks the boundaries of audio manipulation.

Powered by cutting-edge Whisper engines, this tool turns voice replication into child's play.

Key Features:

- Zero-shot Voice Cloning

- Voice Changer with 50+ Celebrity Voices

- YouTube Audio Downloading

- Vocal Isolation

- Multi-Language Text-to-Speech (Edge-TTS, F5-TTS)

- Multi-Language Translation

- Powered by Whisper Engines (Whisper, Faster-Whisper, Whisper-Timestamped)

Video Demos:

1. Voice-Pro Usage Tutorial: https://youtu.be/z8g8LMhoh_o

2. Voice Cloning Celebrity Podcast Demo: https://youtu.be/Wfo7vQCD4no

3. Full Demo Playlist: https://www.youtube.com/playlist?list=PLwx5dnMDVC9Y7dAjm9r26...

Whether you're a content creator, developer, or audio experiment enthusiast,

Voice-Pro provides a user-friendly interface to push the boundaries of audio manipulation.

GitHub: https://github.com/abus-aikorea/voice-pro

89
62
vunderba 41 minutes ago

I do think that voice cloning for personal usage has actual genuine uses - in fact there was a relatively interesting news article about a person who was irrevocably losing their voice who had their vocal pattern cloned.

https://www.voanews.com/a/illness-took-away-her-voice-ai-cre...

That being said, it does seem a bit bizarre that the repo's home page is proudly trumpeting the ability to co-opt other people's identities without their permission (and yes your unique vocal pattern is definitely part of your identity - I mean it's used in some forms of biometric data). They're doing the project a bit of a disservice.

shannifin 2 hours ago

I don't have much real use for celebrity voices (other than fun experimentation), but I'd love to be able to clone my own voice and character voices for the purposes of creating audiobooks / audioplays without having to pay monthly fees with monthly usage limits. So I'm excited by this sort of project!

P.S. Are there any tools for synthetic voice creation? Maybe melding two or more voices together, or just exploring latent space? Would be fun for character creation to create completely new voices.

vunderba 45 minutes ago

I'd be interested as well. This is where I imagine the space is going - particularly as the potential for litigation increases around cloning.

Game studios will spin up a bunch of unique virtual voices for all the dialogue of extras. It'll probably be longer before we see replacements of main characters though. There's been some research in speech-to-speech transference as well - this means that company employee A records the character B's line with the appropriate emotional nuance (angry, sad, etc.) and the emotional aspect is copied on top of the generated TTS.

thelittleone 1 hour ago

Have you tried eleven labs? I used that. Had to record 3 hours of training audio reading books and and news articles. But the result was really good.

shannifin 30 minutes ago

They're great! They just cost too much for how much output I want.

dyauspitr 2 hours ago

I’ve used tortoise tts before and trained it on my voice and a mix of voices. It’s not perfect but still impressive.

muglug 3 hours ago

These tools make it very easy to scam vulnerable people, and have pretty limited use otherwise.

anonzzzies 8 minutes ago

They are pretty good for leaving messages for my blind friend. I generally find calling / voice texts a waste of time (I type and read far faster than I talk or listen, not to mention the ability to reread etc), but my blind friend prefers getting voice messages when on his phone and this works for us. I type and send and when he comes back with something, Whisper makes it into text for me.

Larrikin 3 hours ago

I'm absolutely using celebrity voices for my Home Assistant voice. Amazon has spent the last couple years removing the voices for Alexa that people had paid for.

chefandy 3 hours ago

To be fair, they’ve got pretty serious potential for letting tech companies get paid for a seasoned voice actor’s unique delivery, tone, inflection, etc rather than the voice actor themselves.

whaaaaat 1 hour ago

> they’ve got pretty serious potential for letting tech companies get paid for a seasoned voice actor’s unique delivery, tone, inflection, etc rather than the voice actor themselves.

I think you mean "steal the labor of an actor"?

chefandy 1 hour ago

Sure, and people that already agree with you will feel good reading it, but other people who don’t agree see it as an attack. It’s pretty much impossible to slip a new idea into someone’s mind if your approach made them slam the door before even considering it. So what’s the benefit of saying it like that?

gmueckl 57 minutes ago

It calls attention to the ethical implications of using a part of someone else's personal identity without their direct involvement.

casey2 47 minutes ago

I like tools like these cause they make zero trust default even more obvious, and their "pretty limited use" is saving people hours of work.

chefandy 3 hours ago

Gen AI space to everyone else: “Your computer scientists were so preoccupied with whether or not they should, they didn’t stop to think if they could just do it anyway”

ranger_danger 3 hours ago

How many victims will it take for lawmakers to do something about this?

tiborsaas 2 hours ago

It's already illegal to scam somebody. While it's always positive to protect people more, what can be done here? Any alternative I can imagine is massively oppressive of the current state of the software industry.

You can regulate large companies, you can regulate published software sold for profit, but it's impossible to regulate free and open source tools.

You essentially have to regulate access to computing power if you want to prevent bad actors doing bad things using these sort of tools.

bryanrasmussen 1 hour ago

>You can regulate large companies, you can regulate published software sold for profit, but it's impossible to regulate free and open source tools.

Regulation is putting legal limitations on things, if it is impossible to regulate free and open source tools then it would be impossible to regulate murder and lots of other things, but it turns out it isn't impossible, sure - murder happens - but people get caught for it and punished.

Sorry, but this argument is much like the early internet triumphalism - back when people said it was impossible to regulate. Turns out lots of countries now regulate it.

vunderba 53 minutes ago

Lots of countries impose exactly what specific regulations with respect to open source tooling?

The closest thing I can think of is maybe the regulation of DRM ripping tools, but they're still out there in the wild and determined actors can easily get ahold of them. So I'm not at all confident that regulation will have any measurable meaningful effect.

notTooFarGone 39 minutes ago

The fable of the "determined actor".

The "determined actor" can get bombs, tanks, fissure material. There noone says "WHELP they can get it anyway so why bother regulating it LMAO" - somehow this is different in anything not physical?

tiborsaas 1 hour ago

It depends on what you do with the tool. Going with your murder analogy, if there's a stabbing epidemic what do you do? 1) Ban knives 2) invest in public safety 3) investigate the root causes and improve on them?

I'm also not sure what's so regulated about the internet besides net neutrality in certain countries. Of course the government can put limits on the network, like banning services, but it's easy since they are rather easy to target. With content traveling on the network it's much harder to say if it's legit or not.

> lots of countries

What about those countries that don't regulate it and people will keep pumping out better, leaner and faster models from there? Spreading software is trivial, all you achieve is the public won't be aware of what's possible.

The more I think about it if anything should be regulated that's a requirement to provide third party (probably government backed) ID verification system so it would be possible for my mom to know it's me calling here. Basically kill called ID spoofing.

russell_h 2 hours ago

Serious question: what do you think lawmakers should do?

tsujamin 3 hours ago

Bulldozing grandma is just the cost of technological progress /s

weq 59 minutes ago

This tech is not only great for bulldozing grandma, its great at stealing content from other creators and rebranding it as your own. Based on the github, it kind of seems like thats exactly whats being advertised as the use case. Steal content from BBC, cut it up and pull the noise out/vocals/revoice the content so the algorithm cant detect the plagorism easily. The imagine detection is no where no the audio detection for copyright strikes.

There is a massive problem with this on youtube. Pretty much every category on youtube now has a host of these bots trolling content and playing the youtube strike system like a banjo. There are channels detected to showing you how to setup these content mills. This tool can make you good money.

uh_uh 3 hours ago

This tech is going to be ubiquitous, it's just too easy to distribute it. Grandma better starts adapting now.

thejazzman 3 hours ago

Because people make it so, not because the natural order of the world gets us there

For some reason because we can validates that we should. Any jackass has the power of a research team of phds. It's kinda weird.

chefandy 3 hours ago

Indeed. Humans ascended to dominance because we can cooperate. This every-man-for-themself idea is an aberration, not the natural order as so many claim. It’s rather astounding to think otherwise considering the logistics of how we’re communicating right now.

uh_uh 3 hours ago

Cooperation works if the potential damage caused by a rouge actor is sufficiently low. Otherwise, it's too easy to sabotage things. This is why we don't want random rouge states to have nukes. AI will give so much leverage to rouge actors that it will significantly shift the game theory in favour of not cooperating.

chefandy 1 hour ago

> Cooperation works if the potential damage caused by a rouge actor is sufficiently low. Otherwise, it's too easy to sabotage things. This is why we don't want random rouge states to have nukes. AI will give so much leverage to rouge actors that it will significantly shift the game theory in favour of not cooperating.

Governments successfully collectively controlling dangerous things so they don’t fall into the hands of rogue bad actors fundamentally opposes the extreme individualist every-man-for-himself perspective in every conceivable way. It’s the absolute opposite of “it’s everybody’s responsibility to protect themselves because everybody else is only going to look out for themselves.”

And when individuals have that much leverage, collective action is the only conceivable way to oppose it. Some of those things might be cultural, like mores, some might be laws, some might be more martial. I don’t see how extreme individualism even theoretically could be more powerful.

uh_uh 55 minutes ago

Are you suggesting government action against putting up code like this to GitHub? It’s ok if you are, but I want to put into more concrete terms what we’re talking about.

uh_uh 3 hours ago

Demanding responsible behaviour from everybody is not going to work. Some people don't care about negative externalities that much and it's enough if only a few of them decide not to play ball. So either grandma needs to adapt which will upset some people or distributing the tech should be regulated/prosecuted which will upset another group of people.

rockemsockem 1 hour ago

I think either way grandma needs to adapt though since Russian scammers and trolls are still going to run scams with fake voices.

chefandy 3 hours ago

You can’t adapt around brain age making it more difficult to distinguish truth from lies.

casey2 40 minutes ago

Yeah, I don't really get the hulabaloo, if granny doesn't have the mental fortitude to keep up with the times she shouldn't be managing her own money. I guess better her son/daughter than a scammer but both are better than letting money rot. Put granny on foodstamps and pay $1 for her rent controled housing be done with it.

rockemsockem 1 hour ago

Quit being a doomer or keep it to yourself. This reminds me of the sound boards that were popular in the early 2000s except way more versatile. Some things are just good for people to have fun and THAT'S OKAY.

whaaaaat 1 hour ago

People are allowed to recognize the realistic negative outcomes of technology, especially on a forum that frequently discusses the tradeoffs of modern, cutting edge technologies.

rockemsockem 39 minutes ago

So many AI posts are overrun with this kind of complaining from folks with limited imaginations.

On a forum that frequently discusses technology with enthusiasm you'd think there'd be more enthusiasm and more constructive criticism instead of blanket write-offs.

Mordisquitos 13 minutes ago

I would argue that being able to see the drawbacks and potential negative externalities of a new technology is not a sign of a "limited imagination", but quite the contrary. An actual display of a limited imagination is the inability to imagine how a new technology can (and will) be abused in society by bad actors.

wingworks 40 minutes ago

Just heads up, this is a trail, you have to pay to use it after 30mins..

Easier and (cheaper?) to just use elevenlabs.

vulcanidic 23 minutes ago

It’s a bit of a hassle, but after closing the Windows command, you can restart the program and use it indefinitely. The results you worked on will still remain in the workspace folder.

harryf 2 hours ago

Have you considered supporting whisper-at - https://github.com/YuanGongND/whisper-at ? Being able to identify sounds on a timeline can be useful e.g. politicians speech and how the audience is reacting to it (e.g. clapping, applauding)

jncfhnb 4 hours ago

Is there speech to speech? I have been hoping for a model I can use to do voice acting with inflection

amrrs 3 hours ago

Do you mean Inflection's Pi?

bryanrasmussen 1 hour ago

I think they mean speech "in the style of" the same as repaint this picture in the style of Van Gogh, so they will do the audio and put the correct inflection on things but then rerender it with the voice of Katharine Hepburn for example.

on edit: example of course showing the difficulty as so much of Hepburn was her inflection.

yawnxyz 3 hours ago

> When Windows Defender mistakenly recognizes a [virus] as a Trojan, this is often called a 'False Positive'. To solve this problem, you can go through the following steps:

kfarr 3 hours ago

Yeah I also noticed the install instructions is run this batch file that gets administrator access and starts downloading things…

gruez 3 hours ago

It's not any worse than all the projects on github with an "easy" install instructions of "curl ... | sudo sh". Heck, even an innocent "sudo make install" command can easily contain a malicious payload.

tonyedgecombe 10 minutes ago

It's not really the sort of tool that should require admin rights though.

chefandy 3 hours ago

Yeah it’s not great but it’s definitely not unusual. And windows reputation-based execution blocking does have false positives. I work for a company that has some very very popular products and some that only see a few dozen downloads per week, and despite being signed, it still takes a while for new versions to build enough rep to not trigger the block.

safeimp 3 hours ago

Project looks interesting. Are there short term plans to support MacOS?

If not, any recommendations for alternative projects?

grahamgooch 1 hour ago

Great stuff well done. What is your latency for real time Audio?

joshdavham 2 hours ago

Looks cool! Also, is there a reason you went with a Web-UI instead of making a native desktop app?

XorNot 38 minutes ago

The real utility of something like this is for reducing the creative costs of voice-acting. i.e. something like this is a massive boone for mod-makers where making fully voiced anything is a huge undertaking - i.e. while my friends and family could probably provide their voice if I asked, getting a decent recording and performance out of them is just not going to be possible.

But if I can get the performance I want and shift it to another voice, then fully voicing free works becomes very accessible (even better would be generative AI which could take a sample of what you want and re-render it into something which sounds like a more professional performance - voice in-fill I suppose).

newusertoday 2 hours ago

are there any TTS models which are decent but can work on devices without GPU and have relatively low RAM(4GB)

ilrwbwrkhv 3 hours ago

There are a bunch of yc start-ups who are building new models and stuff in the space. I fear they are going to get decimated really soon as the quality of local llamas keep improving.

whaaaaat 1 hour ago

> Imagine creating a podcast where Mark Zuckerberg interviews Elon Musk – using their actual voices?

I'm imagining it. It sucks to imagine.

I'm imagining it being used to scam people. I'm imagining it to leech off of performers who have worked very hard to build a recognizable voice (and it is a lot of work to speak like a performer). I'm imagining how this will be used in revenge porn. I'm imagining how this will be used to circumvent access to voice controlled things.

This is bad. You should feel bad.

And I know you are thinking, "Wait, but I worked really hard on this!" Sorry, I appreciate that it might be technically impressive, but you've basically come out with "we've invented a device that mixes bleach and ammonia automatically in your bedroom! It's so efficient at mixing those two, we can fill a space with chlorine gas in under 10 seconds! Imagine a world where every bedroom could become a toxic site with only the push of a button.

That this is posted here, proudly, is quite frankly astoundingly embarrassing for you.

farzd 1 hour ago

You do realise this is not the first AI release to clone voices?

cess11 1 hour ago

Sure, and PoisonIvy wasn't the first RAT. So what? Does it get more ethical to assist fraudsters and so on once more people are doing it?

trallnag 34 minutes ago

Soyjak?

aboardRat4 1 hour ago

Without Linux support it is going to have a very limited audience.

okwhateverdude 51 minutes ago

There is nothing in here that precludes you from running this on any OS that supports python + CUDA. They use miniconda for installation of python and python packages, but this could just as easily be a venv + system CUDA install or even better: a container. This is only one tiny Dockerfile away from running anywhere.