Il team di Siri spiega il processo con cui vengono insegnate nuove lingue all'assistente vocale di Apple.
The human speech is recorded and transcribed by other humans. This forms a canonical representation of words and how they sound aloud, dictated by real people to ensure accuracy. This raw training data is then fed into an algorithmic machine training model.
The computer language model attempts to predict the transcription of arbitrary strings of words. The algorithm can improve automatically over time as it is trained with more data. Apple will tune the data a little internally and then move onto the next step.
Instead of jumping straight to Siri, Apple releases the new language as a feature of iOS and macOS dictation, available on the iPhone keyboard by pressing the microphone key next to the spacebar. This allows Apple to gain more speech samples (sent anonymously) from a much wider base of people.
These real-world audio clips naturally incorporate background noise and non-perfect speech like coughing, pauses and slurring. Apple takes the samples and transcribes them by humans, then using this newly verified pairing of audio and text as more input data for the language model. The report says this secondary process cuts the dictation error rate in half.
Apple repeats this procedure until it feels it has made the system accurate enough that is ready to roll out as a headline Siri feature. Separately, voice actors record speech sequences so that Siri can synthesize audio and perform text-to-speech with replies.