The new deep learning model named MelNet can produce human intonation with uncanny accuracy.
Once trained, it can regenerate anybody’s voice over a few seconds.
Researchers demonstrate how precisely it can clone Bill Gates’ voice.
Now, scientists at Facebook AI Research have developed a method to overcome the limitations of existing text-to-speech systems. They have built a generative model — named MelNet — that can produce human intonation with uncanny accuracy. In fact, it can speak fluently with anybody’s voice.
This enables MelNet to produce unconditional speech and music samples with consistency over several seconds. It is also capable of conditional speech generation and text-to-speech synthesis, entirely end-to-end.
On a negative note, the technology raises the specter of a new era of fake audio content. And like other advances in artificial intelligence, it raises more ethical questions than it answers.