[CES 2014] With the arrival of devices like the Moto X, a new class of phones that are always keeping an ear out for their master’s next voice command was born. We are certain that this trend will continue, but it will amplify as devices get smarter to the point of anticipating the needs of their owners. We’re not quite there yet, but along the way, it is crucial to have voice processors that are able to provide a clear signal for both humans and machines to understand, while consuming as little power as possible when they continuously listen to their surroundings. This is a feat that very few companies can deliver, but that’s exactly what the Audience es700 Series architecture was designed to do.The es700 Series will be composed of several products: the es750 and es754 are for high-end, powerful devices that will listen at all times for new commands. The es702 and es704 have the same capabilities, except that it won’t come with a CODEC (encode-decode algorithm) since some of Audience’s customers may prefer to user a 3rd party CODEC or their own.
The first important thing is voice processing quality. We’re no strangers to Audience’s capabilities in that area, and this is something that the company seems to be able to improve incrementally. Their performance has been backed-up by a number of studies and tests, and if you have recorded some audio with a Galaxy Note 2, you can see how good this is, when compared to other phones who don’t use the same level of voice processing.
The e700 Series has been designed to analyze an audio stream without killing the battery life. Of course, this is where specialized hardware can really help, since the goal is to leave the main application processor (something like Snapdragon, Tegra or Atom) asleep until there is a need to take action.
To achieve this, it is important for Audience (or anyone else who wants to do this) to reduce the scope of the voice commands. Here, the chip can have 5 (programmable) phrases as keywords (“OK Google” is considered a phrase for example, but anything relatively short should work). During “listening” mode, the chip uses a tiny 0.5mA and it’s only when it thinks that it picked up something that the main processor as awakened. Having a low false-positive rateis critical because if the main processor was awoken too often for nothing (ambient noise, TV…) the power drain would quickly become unsustainable.
Finally, while looking at the specifications, I found the wind noise suppression to look very interesting. Wind typically saturates the audio signal, which makes it hard to process it down the line. By using a high dynamic range, complete saturation saturation is less likely to happen. Also, there’s a new speaker-phone mode that can pick up incoming sound at 360 degrees. This is useful if you hold your phone in an odd position, or if you use the smartphone as a conference phone on a table. Most handsets are not optimized for things like these.
. Read more about