DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage
Title | DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage |
Publication Type | Conference Paper |
Year of Publication | 2024 |
Authors | Wang K., Herremans D. |
Conference Name | Proc. of IEEE Tencon, Singapore |
Abstract | Laughing, sighing, stuttering, and other forms of paralanguage do not contribute any direct lexical meaning to speech, but they provide crucial propositional context that aids semantic and pragmatic processes such as irony. It is thus important for artificial social agents to both understand and be able to generate speech with semantically-important paralanguage. Most speech datasets do not include transcribed non-lexical speech sounds and disfluencies, while those that do are typically multi-speaker datasets where each speaker provides relatively little audio. This makes it challenging to train conversational Text-to-Speech (TTS) synthesis models that include such paralinguistic components. |
URL | https://arxiv.org/abs/2406.08820 |