It is crucial that ASR systems can handle the wide range of variations in speech of speakers from different demographic groups, with different speaking styles, and of speakers with (dis)abilities. A potential quality-of-service harm arises when ASR systems do not perform equally well for everyone. ASR systems may exhibit bias against certain types of speech, such as non-native accents, different age groups and gender. In this study, we evaluate two widely-used neural network-based architectures: Wav2vec2 and Whisper on potential biases for Dutch speakers. We used the Dutch speech corpus JASMIN as a test set containing read and conversational speech in a human-machine interaction setting. The results reveal a significant bias against non-natives, children and elderly and some regional dialects. The ASR systems generally perform slightly better for women than for men.
MULTIFILE
Abstract van Poster presentatie. Our student-interpreters feel ill prepared for assignments that involve sign supported speech (Anonymous, 2015). This is probably due to the fact that there is no single way of communicating in sign supported speech (Sutton-Spence & Woll, 1999). Our study investigates if and how we could prepare our students within a fouryear bachelor curriculum.
When an adult claims he cannot sleep without his teddy bear, people tend to react surprised. Language interpretation is, thus, influenced by social context, such as who the speaker is. The present study reveals inter-individual differences in brain reactivity to social aspects of language. Whereas women showed brain reactivity when stereotype-based inferences about a speaker conflicted with the content of the message, men did not. This sex difference in social information processing can be explained by a specific cognitive trait, one's ability to empathize. Individuals who empathize to a greater degree revealed larger N400 effects (as well as a larger increase in γ-band power) to socially relevant information. These results indicate that individuals with high-empathizing skills are able to rapidly integrate information about the speaker with the content of the message, as they make use of voice-based inferences about the speaker to process language in a top-down manner. Alternatively, individuals with lower empathizing skills did not use information about social stereotypes in implicit sentence comprehension, but rather took a more bottom-up approach to the processing of these social pragmatic sentences.
MULTIFILE