Imagine creating a podcast where Mark Zuckerberg interviews Elon Musk – using their actual voices?
What sounds like science fiction is now reality.
Voice-Pro is an open-source Gradio WebUI that breaks the boundaries of audio manipulation.
Powered by cutting-edge Whisper engines, this tool turns voice replication into child's play.
Key Features:
- Zero-shot Voice Cloning
- Voice Changer with 50+ Celebrity Voices
- YouTube Audio Downloading
- Vocal Isolation
- Multi-Language Text-to-Speech (Edge-TTS, F5-TTS)
- Multi-Language Translation
- Powered by Whisper Engines (Whisper, Faster-Whisper, Whisper-Timestamped)
Video Demos:
1. Voice-Pro Usage Tutorial: https://youtu.be/z8g8LMhoh_o
2. Voice Cloning Celebrity Podcast Demo: https://youtu.be/Wfo7vQCD4no
3. Full Demo Playlist: https://www.youtube.com/playlist?list=PLwx5dnMDVC9Y7dAjm9r26...
Whether you're a content creator, developer, or audio experiment enthusiast,
Voice-Pro provides a user-friendly interface to push the boundaries of audio manipulation.
I do think that voice cloning for personal usage has actual genuine uses - in fact there was a relatively interesting news article about a person who was irrevocably losing their voice who had their vocal pattern cloned.
https://www.voanews.com/a/illness-took-away-her-voice-ai-cre...
That being said, it does seem a bit bizarre that the repo's home page is proudly trumpeting the ability to co-opt other people's identities without their permission (and yes your unique vocal pattern is definitely part of your identity - I mean it's used in some forms of biometric data). They're doing the project a bit of a disservice.
I don't have much real use for celebrity voices (other than fun experimentation), but I'd love to be able to clone my own voice and character voices for the purposes of creating audiobooks / audioplays without having to pay monthly fees with monthly usage limits. So I'm excited by this sort of project!
P.S. Are there any tools for synthetic voice creation? Maybe melding two or more voices together, or just exploring latent space? Would be fun for character creation to create completely new voices.
I'd be interested as well. This is where I imagine the space is going - particularly as the potential for litigation increases around cloning.
Game studios will spin up a bunch of unique virtual voices for all the dialogue of extras. It'll probably be longer before we see replacements of main characters though. There's been some research in speech-to-speech transference as well - this means that company employee A records the character B's line with the appropriate emotional nuance (angry, sad, etc.) and the emotional aspect is copied on top of the generated TTS.
Have you tried eleven labs? I used that. Had to record 3 hours of training audio reading books and and news articles. But the result was really good.
I’ve used tortoise tts before and trained it on my voice and a mix of voices. It’s not perfect but still impressive.
These tools make it very easy to scam vulnerable people, and have pretty limited use otherwise.
They are pretty good for leaving messages for my blind friend. I generally find calling / voice texts a waste of time (I type and read far faster than I talk or listen, not to mention the ability to reread etc), but my blind friend prefers getting voice messages when on his phone and this works for us. I type and send and when he comes back with something, Whisper makes it into text for me.
I'm absolutely using celebrity voices for my Home Assistant voice. Amazon has spent the last couple years removing the voices for Alexa that people had paid for.
To be fair, they’ve got pretty serious potential for letting tech companies get paid for a seasoned voice actor’s unique delivery, tone, inflection, etc rather than the voice actor themselves.
> they’ve got pretty serious potential for letting tech companies get paid for a seasoned voice actor’s unique delivery, tone, inflection, etc rather than the voice actor themselves.
I think you mean "steal the labor of an actor"?
Sure, and people that already agree with you will feel good reading it, but other people who don’t agree see it as an attack. It’s pretty much impossible to slip a new idea into someone’s mind if your approach made them slam the door before even considering it. So what’s the benefit of saying it like that?
It calls attention to the ethical implications of using a part of someone else's personal identity without their direct involvement.
I like tools like these cause they make zero trust default even more obvious, and their "pretty limited use" is saving people hours of work.
Gen AI space to everyone else: “Your computer scientists were so preoccupied with whether or not they should, they didn’t stop to think if they could just do it anyway”
How many victims will it take for lawmakers to do something about this?
It's already illegal to scam somebody. While it's always positive to protect people more, what can be done here? Any alternative I can imagine is massively oppressive of the current state of the software industry.
You can regulate large companies, you can regulate published software sold for profit, but it's impossible to regulate free and open source tools.
You essentially have to regulate access to computing power if you want to prevent bad actors doing bad things using these sort of tools.
>You can regulate large companies, you can regulate published software sold for profit, but it's impossible to regulate free and open source tools.
Regulation is putting legal limitations on things, if it is impossible to regulate free and open source tools then it would be impossible to regulate murder and lots of other things, but it turns out it isn't impossible, sure - murder happens - but people get caught for it and punished.
Sorry, but this argument is much like the early internet triumphalism - back when people said it was impossible to regulate. Turns out lots of countries now regulate it.
Lots of countries impose exactly what specific regulations with respect to open source tooling?
The closest thing I can think of is maybe the regulation of DRM ripping tools, but they're still out there in the wild and determined actors can easily get ahold of them. So I'm not at all confident that regulation will have any measurable meaningful effect.
The fable of the "determined actor".
The "determined actor" can get bombs, tanks, fissure material. There noone says "WHELP they can get it anyway so why bother regulating it LMAO" - somehow this is different in anything not physical?
It depends on what you do with the tool. Going with your murder analogy, if there's a stabbing epidemic what do you do? 1) Ban knives 2) invest in public safety 3) investigate the root causes and improve on them?
I'm also not sure what's so regulated about the internet besides net neutrality in certain countries. Of course the government can put limits on the network, like banning services, but it's easy since they are rather easy to target. With content traveling on the network it's much harder to say if it's legit or not.
> lots of countries
What about those countries that don't regulate it and people will keep pumping out better, leaner and faster models from there? Spreading software is trivial, all you achieve is the public won't be aware of what's possible.
The more I think about it if anything should be regulated that's a requirement to provide third party (probably government backed) ID verification system so it would be possible for my mom to know it's me calling here. Basically kill called ID spoofing.
Bulldozing grandma is just the cost of technological progress /s
This tech is not only great for bulldozing grandma, its great at stealing content from other creators and rebranding it as your own. Based on the github, it kind of seems like thats exactly whats being advertised as the use case. Steal content from BBC, cut it up and pull the noise out/vocals/revoice the content so the algorithm cant detect the plagorism easily. The imagine detection is no where no the audio detection for copyright strikes.
There is a massive problem with this on youtube. Pretty much every category on youtube now has a host of these bots trolling content and playing the youtube strike system like a banjo. There are channels detected to showing you how to setup these content mills. This tool can make you good money.
This tech is going to be ubiquitous, it's just too easy to distribute it. Grandma better starts adapting now.
Because people make it so, not because the natural order of the world gets us there
For some reason because we can validates that we should. Any jackass has the power of a research team of phds. It's kinda weird.
Indeed. Humans ascended to dominance because we can cooperate. This every-man-for-themself idea is an aberration, not the natural order as so many claim. It’s rather astounding to think otherwise considering the logistics of how we’re communicating right now.
Cooperation works if the potential damage caused by a rouge actor is sufficiently low. Otherwise, it's too easy to sabotage things. This is why we don't want random rouge states to have nukes. AI will give so much leverage to rouge actors that it will significantly shift the game theory in favour of not cooperating.
> Cooperation works if the potential damage caused by a rouge actor is sufficiently low. Otherwise, it's too easy to sabotage things. This is why we don't want random rouge states to have nukes. AI will give so much leverage to rouge actors that it will significantly shift the game theory in favour of not cooperating.
Governments successfully collectively controlling dangerous things so they don’t fall into the hands of rogue bad actors fundamentally opposes the extreme individualist every-man-for-himself perspective in every conceivable way. It’s the absolute opposite of “it’s everybody’s responsibility to protect themselves because everybody else is only going to look out for themselves.”
And when individuals have that much leverage, collective action is the only conceivable way to oppose it. Some of those things might be cultural, like mores, some might be laws, some might be more martial. I don’t see how extreme individualism even theoretically could be more powerful.
Are you suggesting government action against putting up code like this to GitHub? It’s ok if you are, but I want to put into more concrete terms what we’re talking about.
Demanding responsible behaviour from everybody is not going to work. Some people don't care about negative externalities that much and it's enough if only a few of them decide not to play ball. So either grandma needs to adapt which will upset some people or distributing the tech should be regulated/prosecuted which will upset another group of people.
I think either way grandma needs to adapt though since Russian scammers and trolls are still going to run scams with fake voices.
You can’t adapt around brain age making it more difficult to distinguish truth from lies.
Yeah, I don't really get the hulabaloo, if granny doesn't have the mental fortitude to keep up with the times she shouldn't be managing her own money. I guess better her son/daughter than a scammer but both are better than letting money rot. Put granny on foodstamps and pay $1 for her rent controled housing be done with it.
Quit being a doomer or keep it to yourself. This reminds me of the sound boards that were popular in the early 2000s except way more versatile. Some things are just good for people to have fun and THAT'S OKAY.
People are allowed to recognize the realistic negative outcomes of technology, especially on a forum that frequently discusses the tradeoffs of modern, cutting edge technologies.
So many AI posts are overrun with this kind of complaining from folks with limited imaginations.
On a forum that frequently discusses technology with enthusiasm you'd think there'd be more enthusiasm and more constructive criticism instead of blanket write-offs.
I would argue that being able to see the drawbacks and potential negative externalities of a new technology is not a sign of a "limited imagination", but quite the contrary. An actual display of a limited imagination is the inability to imagine how a new technology can (and will) be abused in society by bad actors.
Just heads up, this is a trail, you have to pay to use it after 30mins..
Easier and (cheaper?) to just use elevenlabs.
It’s a bit of a hassle, but after closing the Windows command, you can restart the program and use it indefinitely. The results you worked on will still remain in the workspace folder.
Have you considered supporting whisper-at - https://github.com/YuanGongND/whisper-at ? Being able to identify sounds on a timeline can be useful e.g. politicians speech and how the audience is reacting to it (e.g. clapping, applauding)
Is there speech to speech? I have been hoping for a model I can use to do voice acting with inflection
Do you mean Inflection's Pi?
I think they mean speech "in the style of" the same as repaint this picture in the style of Van Gogh, so they will do the audio and put the correct inflection on things but then rerender it with the voice of Katharine Hepburn for example.
on edit: example of course showing the difficulty as so much of Hepburn was her inflection.
> When Windows Defender mistakenly recognizes a [virus] as a Trojan, this is often called a 'False Positive'. To solve this problem, you can go through the following steps:
Yeah I also noticed the install instructions is run this batch file that gets administrator access and starts downloading things…
It's not any worse than all the projects on github with an "easy" install instructions of "curl ... | sudo sh". Heck, even an innocent "sudo make install" command can easily contain a malicious payload.
It's not really the sort of tool that should require admin rights though.
Yeah it’s not great but it’s definitely not unusual. And windows reputation-based execution blocking does have false positives. I work for a company that has some very very popular products and some that only see a few dozen downloads per week, and despite being signed, it still takes a while for new versions to build enough rep to not trigger the block.
Project looks interesting. Are there short term plans to support MacOS?
If not, any recommendations for alternative projects?
Looks cool! Also, is there a reason you went with a Web-UI instead of making a native desktop app?
The real utility of something like this is for reducing the creative costs of voice-acting. i.e. something like this is a massive boone for mod-makers where making fully voiced anything is a huge undertaking - i.e. while my friends and family could probably provide their voice if I asked, getting a decent recording and performance out of them is just not going to be possible.
But if I can get the performance I want and shift it to another voice, then fully voicing free works becomes very accessible (even better would be generative AI which could take a sample of what you want and re-render it into something which sounds like a more professional performance - voice in-fill I suppose).
are there any TTS models which are decent but can work on devices without GPU and have relatively low RAM(4GB)
There are a bunch of yc start-ups who are building new models and stuff in the space. I fear they are going to get decimated really soon as the quality of local llamas keep improving.
> Imagine creating a podcast where Mark Zuckerberg interviews Elon Musk – using their actual voices?
I'm imagining it. It sucks to imagine.
I'm imagining it being used to scam people. I'm imagining it to leech off of performers who have worked very hard to build a recognizable voice (and it is a lot of work to speak like a performer). I'm imagining how this will be used in revenge porn. I'm imagining how this will be used to circumvent access to voice controlled things.
This is bad. You should feel bad.
And I know you are thinking, "Wait, but I worked really hard on this!" Sorry, I appreciate that it might be technically impressive, but you've basically come out with "we've invented a device that mixes bleach and ammonia automatically in your bedroom! It's so efficient at mixing those two, we can fill a space with chlorine gas in under 10 seconds! Imagine a world where every bedroom could become a toxic site with only the push of a button.
That this is posted here, proudly, is quite frankly astoundingly embarrassing for you.
Without Linux support it is going to have a very limited audience.
There is nothing in here that precludes you from running this on any OS that supports python + CUDA. They use miniconda for installation of python and python packages, but this could just as easily be a venv + system CUDA install or even better: a container. This is only one tiny Dockerfile away from running anywhere.