A team of computer programmers and researchers have come up with a new technique that generates real-time animated speech and offers a better alternative to manual voice animation. The new capability was unveiled by programmers from University of East Anglia, Caltech and Carnegie Mellon University and researchers from Disney at Siggraph 2017 in Los Angeles.
The researchers developed a deep learning approach, which involved training a neural network to take "recorded words from a voice actor, predict the mouth shape needed, and animate a character to sync the speech". This is a vast improvement over manual voice animation, which requires skilled animators to match a character's mouth to deliver lines.
While working on this method, the programmers recorded audio and video of a voice actor reciting over 2,500 phonetically diverse sentences. They tracked the speaker's face to create a reference face for an animation model and then transcribed the audio into distinct bits of sounds using an off-the-shelf speech recognition software.
Finally, they plugged all of that information into a neural network and generated a model that could "animate the reference face from a frame-by-frame sequence of phonemes". This animation was then matched with a CG character to deliver dialogues in real-time.
Though the neural network only takes a couple of hours to train, the approach worked perfectly for all speakers, even for animating foreign languages and songs.
"Our results so far show that our approach achieves state-of-the-art performance in visual speech animation," said lead researcher Dr Sarah Taylor in a statement. "The real beauty is that it is very straightforward to use and easy to edit and stylize the animation using standard production editing software".
The concept of generating natural animated speech for any style of character could be a major boon for Disney, leading production houses, and for the gaming industry. Characters could deliver their dialogues on the fly with much more realism than currently possible.
Highlighting the importance of top-notch speech animation, Taylor noted, "Realistic speech animation is essential for effective character animation. Done badly, it can be distracting and lead to a box office flop".