Siri is best known for its localization. The virtual assistant is available in 24 languages across 36 countries. Meanwhile, Google’s Assistant is only capable of five languages, and Amazon’s Alexa supporting two.
iOS 10.3 introduced yet another language, Shanghainese, giving Apple yet another arm in the race. In an interview with Reuters, Apple’s head of speech explains how Siri learns a new language.
Alex Acero, who joined Apple in 2013 and is currently the head of speech, explains the story. He says that Siri was originally powered by Nuance, but was then replaced by an Apple in-house voice platform that relies on machine learning to understand voices and words.
In terms of learning new languages, Acero explains that Apple brings in real people who speak the language to read various paragraphs and word lists, spanning across several different accents.
The human speech is then recorded and broken down by other humans. This way, the company can ensure accuracy and correct errors. This data is then fed into a machine learning algorithmic training model.
This computer transcription data model will then attempt to predict words and can be improved over time.
Rather than jumping right to Siri, Apple then publishes the languages on iOS and macOS as dictation languages. This way, Apple can gain an even wider range of data (anonymously) to put in its database.
These real-world audio sample obviously contain background noise, pauses and slurs, which can help Siri learn this information in the future. Again, this data is transcribed by real humans.
These steps are repeated until Apple feels it is comfortable enough to release the language to Siri. At this point, Apple will hire voice actors to record audio so that Siri is able to give you voice feedback when asked a question.