You are currently viewing AI Text-to-Speech May Soon Forget Specific Voices

AI Text-to-Speech May Soon Forget Specific Voices

Text-to-speech technology may soon get a major upgrade in safety and privacy. New research suggests that AI speech models can be taught to “unlearn” how to mimic specific voices, helping prevent misuse while preserving quality for other tasks.

Traditionally, companies use guardrails to stop AI from producing harmful or sensitive output. But even with protections, clever prompting can still unlock unwanted behavior. A new approach—machine unlearning—takes a different route: instead of just blocking certain outputs, it teaches the model to forget specific training data entirely.

This concept is now being tested on text-to-speech models, which often use just a few voice samples to accurately mimic anyone, even those not in the training set. The challenge is getting the model to forget certain voices while still sounding natural with others.

To explore this, researchers modified Meta’s VoiceBox model. When asked to recreate a redacted voice, the system now responds using a completely random synthetic voice, instead of the original. According to testing, this process reduces the model’s ability to mimic the redacted voice by over 75%, while only slightly lowering performance—about 2.8%—on voices it’s still allowed to use.

The breakthrough will be presented at this week’s International Conference on Machine Learning, and a public demo reveals just how convincingly these “forgotten” voices differ from their originals. It’s a promising step for making text-to-speech tools safer, smarter, and more ethical.