Multimodal explainable AI predicts upcoming speech behavior in adults who stutter

Das, Arun; Mock, Jeffrey R.; Irani, Farzan; Huang, Yufei; Najafirad, Peyman; Golob, Edward J.

doi:10.3389/fnins.2022.912798

Cited by 5 publications

(5 citation statements)

References 86 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the medical domain, multimodal data are often complementary with each other, meaning that each type of data (e.g. images, sensor data, text report) can be used to extract unique latent representations to allow a better understanding of a pathology [160] 2021 Predictive Seizure Forecasting Private 3 Saeed et al [161] 2021 Multiple Multiple HHAR [144], MobiAct [147], MotionSense [149], UCI HAR [146], HAPT [162], Sleep-EDF [131], MIT Driver DB [163], WiFi CSI [164] 4 Spathis et al [165] 2021 Predictive (HR forecasting) Subject Health Private 5 Thiam et al [166] 2021 Generative Pain Classification BioVid heat pain [167], SenseEmotion [168] 6 Das et al [169] 2022 Predictive Stuttering prediction Private 7 Deldari et al [170] 2022 Contrastive (COCOA) Multiple UCI HAR [146], SLEEP-EDF, PAMAP2 [171], WE-SAD [87], Opportunity [172] 8 Lemkhenter et al [173] 2022 Predictive (PhaseSwap)…”

Section: Discussion and Open Challengesmentioning

confidence: 99%

“…The last two selected multimodal approaches combine biosignals with video recordings. Leveraging a combination of EEG and facial activity data extracted from video, Das et al [169] trained an explainable AI model to predict upcoming speech stuttering. Also, Martini et al [160] showed the potentiality of multimodal self-supervised learning by combining stereoencephalography (SEEG) and video data to forecast seizure events in drug resistant epileptic subjects.…”

Section: Multimodal Self-supervised Learning With Biosignalsmentioning

confidence: 99%

See 1 more Smart Citation

Applications of Self-Supervised Learning to Biomedical Signals: where are we now

Pup¹,

Atzori²

2023

Preprint

View full text Add to dashboard Cite

<p>Over the last decade, deep learning applications in biomedical research have exploded, demonstrating the ability to often outperform previous machine learning approaches in various tasks. However, training deep learning models requires large amounts of data annotated by experts, whose collection is often time- and cost- prohibitive in the biomedical domain. Self-Supervised Learning (SSL) has emerged as a prominent solution for these problems, as it allows to learn powerful data representations in an unsupervised manner. Despite most applications in biomedical research targeted images, the high amount of recent works targeting biosignals can make it difficult for researchers to have a complete picture of the current situation. The aim of this paper is to outline and clarify the state of the art in the domain. The article briefly summarizes the nature and acquisition modality of biomedical signals, introduces the SSL method, and provides a complete but synthetic overview of the main works applying SSL for the analysis of biomedical signals. The analysis of the scientific literature highlights the importance of SSL, confirming its potential to improve the integration of deep learning into clinical tasks. </p>

show abstract

Section: Discussion and Open Challengesmentioning

confidence: 99%

Section: Multimodal Self-supervised Learning With Biosignalsmentioning

confidence: 99%

Applications of Self-Supervised Learning to Biomedical Signals: where are we now

Pup¹,

Atzori²

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Leveraging a combination of EEG and facial activity data extracted from video, Das et al . [185] trained an explainable AI model to predict upcoming speech stuttering, while Martini et al . [178] showed the potentiality of multimodal selfsupervised learning by combining stereoencephalography (SEEG) and video data to forecast seizure events in drug resistant epileptic subjects.…”

Section: Multimodal Self-supervised Learning With Biosignalsmentioning

confidence: 99%

Applications of Self-Supervised Learning to Biomedical Signals: A Survey

Pup,

Atzori

2023

IEEE Access

View full text Add to dashboard Cite

Over the last decade, deep learning applications in biomedical research have exploded, demonstrating their ability to often outperform previous machine learning approaches in various tasks. However, training deep learning models for biomedical applications requires large amounts of data annotated by experts, whose collection is often time-and cost-prohibitive. Self-Supervised Learning (SSL) has emerged as a prominent solution for such problem, as it allows to learn powerful representations from vast unlabeled data by producing supervisory signals directly from the data.

show abstract

“…2.1.3 Dysfluent Speech Recognition. Technical work on improving speech assistants for PWS has focused on ASR models [8,23,31,35,50,51,61], stuttering detection [43], dysfluency detection or classification [22,40,42,48,56], clinical assessment [11], and dataset development [12,37,42,55]. Shonibare et al [61] and Mendelev et al [50] investigate training end-to-end RNN-T ASR models on speech from PWS.…”

Section: Overview Of Speech Recognition Systemsmentioning

confidence: 99%

“…Research on speech technology for PWS has largely focused on technical improvements to automatic speech recognition (ASR) models [31,35,50,51,61], dysfluency detection [22,40,42,48], and dataset development [12,37,42,55]. This body of work has largely lacked a human-centered approach to understanding the experiences that PWS have with speech recognition systems [17], which could in turn inform how to prioritize and advance technical improvements.…”

Section: Introductionmentioning

confidence: 99%

From User Perceptions to Technical Improvement: Enabling People Who Stutter to Better Use Speech Recognition

Lea¹,

Huang²,

Tooley³

et al. 2023

Preprint

View full text Add to dashboard Cite

Consumer speech recognition systems do not work as well for many people with speech differences, such as stuttering, relative to the rest of the general population. However, what is not clear is the degree to which these systems do not work, how they can be improved, or how much people want to use them. In this paper, we first address these questions using results from a 61-person survey from people who stutter and find participants want to use speech recognition but are frequently cut off, misunderstood, or speech predictions do not represent intent. In a second study, where 91 people who stutter recorded voice assistant commands and dictation, we quantify how dysfluencies impede performance in a consumer-grade speech recognition system. Through three technical investigations, we demonstrate how many common errors can be prevented, resulting in a system that cuts utterances off 79.1% less often and improves word error rate from 25.4% to 9.9%. CCS CONCEPTS• Human-centered computing → Empirical studies in accessibility.

show abstract

Multimodal explainable AI predicts upcoming speech behavior in adults who stutter

Cited by 5 publications

References 86 publications

Applications of Self-Supervised Learning to Biomedical Signals: where are we now

Applications of Self-Supervised Learning to Biomedical Signals: where are we now

Applications of Self-Supervised Learning to Biomedical Signals: A Survey

From User Perceptions to Technical Improvement: Enabling People Who Stutter to Better Use Speech Recognition

Contact Info

Product

Resources

About