However, Google thinks that they might be able to solve these problems with a new translation model called Translatotron. What it does is that it attempts to take the tone and cadence of the person talking and apply that to the translation as well, which hopefully will result in slightly more natural-sounding speech.
According to Google, “By incorporating a speaker encoder network, Translatotron is also able to retain the original speaker’s vocal characteristics in the translated speech, which makes the translated speech sound more natural and less jarring.” That being said, from the audio samples shared on Google’s blog, it is still very much obvious that it is a computer speaking back to you.
This is versus some of Google’s other AI efforts such as Duplex which has fooled many into thinking that they were talking to an actual human being. However, Translatotron is still very much in the works so we imagine that it should improve over time, but for now it does seem promising.