The Accuracy of Speech Recognition Dictation Software for Minority Language Users: A Case Study
To determine the role that discourse plays in shaping the output of speech recognition dictation software, I asked four participants, three of whom were minority language speakers (one non-native speaker of English and two speakers of African American Vernacular English) to fully enroll in an automatic speech recognition dictation program and subsequently compose a series of texts over time, using IBM’s ViaVoice 98 dictation software. I designed the task protocol to require these participants to compose two genres of writing (a summary and a response) based on the same stimulus text .To gauge software learning, I used recognition accuracy as the evaluation criterion, and the word error rate (WER) as the measure. I then calculated the software’s reduction of (transcription) error from the first speech session to the last. These calculations suggested the robustness of the software’s speech engine, as well as some indication as to how the participants’ discourse (accent, speech style and genre of speaking/writing) interacted with the software’s transcription.
With respect to the role that the participants’ discourse played in shaping the transcription accuracy of the software, results indicated that the software reacted differently to the various ethno-linguistic speech backgrounds represented in this study. In terms of WER measures, the software recognized the speaker of “General American” the best. This participant proved himself a “sheep” in that he maintained a consistent pronunciation style throughout the study. The non-native speaker of English held the second lowest average WER, followed very closely by one of the AAVE speakers. In terms of text type, the responses seemed to render more accurate transcriptions than summaries. The participants’ self-reported that the responses “flowed easier” and were more “spoken like.” The interaction of text type and recognition accuracy is not a well-researched phenomenon and may be a fruitful area for further inquiry.
Keywords: Speech Recognition, Discouse Analysis, Genre Studies
Dr. Elizabeth Meddeb
Assistant Professor, Foreign Languages/ESL/Humanities, York College