AI Text-to-Speech May Soon Forget Specific Voices

Post author:Isabella Reviewer
Post category:AI News
Reading time:2 mins read

Text-to-speech technology may soon get a major upgrade in safety and privacy. New research suggests that AI speech models can be taught to “unlearn” how to mimic specific voices, helping prevent misuse while preserving quality for other tasks.

Traditionally, companies use guardrails to stop AI from producing harmful or sensitive output. But even with protections, clever prompting can still unlock unwanted behavior. A new approach—machine unlearning—takes a different route: instead of just blocking certain outputs, it teaches the model to forget specific training data entirely.

This concept is now being tested on text-to-speech models, which often use just a few voice samples to accurately mimic anyone, even those not in the training set. The challenge is getting the model to forget certain voices while still sounding natural with others.

To explore this, researchers modified Meta’s VoiceBox model. When asked to recreate a redacted voice, the system now responds using a completely random synthetic voice, instead of the original. According to testing, this process reduces the model’s ability to mimic the redacted voice by over 75%, while only slightly lowering performance—about 2.8%—on voices it’s still allowed to use.

The breakthrough will be presented at this week’s International Conference on Machine Learning, and a public demo reveals just how convincingly these “forgotten” voices differ from their originals. It’s a promising step for making text-to-speech tools safer, smarter, and more ethical.

Tags: AI, artificial intelligence, GPT, machine learning, neural network, news, programs, texttospeech

You Might Also Like

Grok 4: The AI That Shapes What You Think

AI Shifts From Chatbots to the Browser

OpenAI Launches ChatGPT Agent for Real-World Automation

Please Share This Share this content

Share this content