AudiBERT

Toto, Ermal; Tlachac, ML; Rundensteiner, Elke A.

doi:10.1145/3459637.3481895

Cited by 28 publications

(5 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As noted, the systematic review included 34 studies published between February 2019 and March 2024 that investigate the application of LLMs in various aspects of depression research (3,7,24,24–54). These studies encompassed a wide variety of sample sizes, ranging from as few as 25 to over 632,000, and utilized data types ranging from clinical interview transcriptions and electronic health records to user-generated content on social media platforms ( Table 1 ).…”

Section: Resultsmentioning

confidence: 99%

Exploring the Efficacy and Potential of Large Language Models for Depression: A Systematic Review

Omar,

Levkovich

2024

Preprint

View full text Add to dashboard Cite

Background and Objective: Depression is a substantial public health issue, with global ramifications. While initial literature reviews explored the intersection between artificial intelligence (AI) and mental health, they have not yet critically assessed the specific contributions of Large Language Models (LLMs) in this domain. The objective of this systematic review was to examine the usefulness of LLMs in diagnosing and managing depression, as well as to investigate their incorporation into clinical practice. Methods: This review was based on a thorough search of the PubMed, Embase, Web of Science, and Scopus databases for the period January 2018 through March 2024. The search used PROSPERO and adhered to PRISMA guidelines. Original research articles, preprints, and conference papers were included, while non-English and non-research publications were excluded. Data extraction was standardized, and the risk of bias was evaluated using the ROBINS-I, QUADAS-2, and PROBAST tools. Results: Our review included 34 studies that focused on the application of LLMs in detecting and classifying depression through clinical data and social media texts. LLMs such as RoBERTa and BERT demonstrated high effectiveness, particularly in early detection and symptom classification. Nevertheless, the integration of LLMs into clinical practice is in its nascent stage, with ongoing concerns about data privacy and ethical implications. Conclusion: LLMs exhibit significant potential for transforming strategies for diagnosing and treating depression. Nonetheless, full integration of LLMs into clinical practice requires rigorous testing, ethical considerations, and enhanced privacy measures to ensure their safe and effective use.

show abstract

Section: Resultsmentioning

confidence: 99%

Exploring the Efficacy and Potential of Large Language Models for Depression: A Systematic Review

Omar,

Levkovich

2024

Preprint

View full text Add to dashboard Cite

show abstract

“…Learn shared representations from weighted modality-specific representations Gated Multimodal Unit (GMU) [429], parallel attention model, attention layer, sparse MLP (mix vertical and horizontal information via weight sharing and sparse connection), multimodal encoder-decoder, multimodal factorized bilinear pooling (combines compact output features of multi-modal low-rank bilinear [430] and robustness of multi-modal compact bilinear [431]), multi-head intermodal attention fusion, transformer [295], feed-forward network, low-rank multimodal fusion network [432] [ 62,65,67,76,93,100,102,106,113,117,131,135,136,[142][143][144]174,218,433] Learn joint sparse representations Dictionary learning [20] Learn and fuse outputs from different modality-specific parts at fixed time steps Cell-coupled LSTM with L-skip fusion mechanism [101] Learn cross-modality representations that incorporate interactions between modalities LXMERT [434], transformer encoder with cross-attention layers (representations of a modality as query and the other as key/value, and vice versa), memory fusion network [435] [82, 92,129] Horizontal and vertical kernels to capture patterns across different levels CASER [309] [170]…”

Section: Model Levelmentioning

confidence: 99%

Machine Learning for Multimodal Mental Health Detection: A Systematic Review of Passive Sensing Approaches

Khoo,

Lim,

Chong

et al. 2024

Sensors

View full text Add to dashboard Cite

As mental health (MH) disorders become increasingly prevalent, their multifaceted symptoms and comorbidities with other conditions introduce complexity to diagnosis, posing a risk of underdiagnosis. While machine learning (ML) has been explored to mitigate these challenges, we hypothesized that multiple data modalities support more comprehensive detection and that non-intrusive collection approaches better capture natural behaviors. To understand the current trends, we systematically reviewed 184 studies to assess feature extraction, feature fusion, and ML methodologies applied to detect MH disorders from passively sensed multimodal data, including audio and video recordings, social media, smartphones, and wearable devices. Our findings revealed varying correlations of modality-specific features in individualized contexts, potentially influenced by demographics and personalities. We also observed the growing adoption of neural network architectures for model-level fusion and as ML algorithms, which have demonstrated promising efficacy in handling high-dimensional features while modeling within and cross-modality relationships. This work provides future researchers with a clear taxonomy of methodological approaches to multimodal detection of MH disorders to inspire future methodological advancements. The comprehensive analysis also guides and supports future researchers in making informed decisions to select an optimal data source that aligns with specific use cases based on the MH disorder of interest.

show abstract

“…In order to improve the performance of multi-class deep neural networks [23] In order to detect depression as early as possible through speech [33], Ermal et al used the DAIC-WOZ dataset to propose a new deep learning framework AudiBERT that utilizes the multimodal characteristics of the human voice. In order to realize the prediction of Parkinson's disease (PD) [34], S. Kamoji et al used the Freezing of Gait dataset, Parkinson's clinical voice dataset, Parkinson's disease wave and spiral drawing dataset, and designed predictive models based on the decision tree, KNN, transfer learning, and Convolutional Neural Networks.…”

Section: Related Workmentioning

confidence: 99%

Risevi: A Disease Risk Prediction Model Based on Vision Transformer Applied to Nursing Homes

Zhou

Xiaoli³

et al. 2023

Electronics

View full text Add to dashboard Cite

The intensification of population aging has brought pressure on public medical care. In order to reduce this pressure, we combined the image classification method with computer vision and used audio data that is easy to collect in nursing homes. Based on MelGAN, transfer learning, and Vision Transformer, we propose an application called Risevi (A Disease Risk Prediction Model Based on Vision Transformer), a disease risk prediction model for nursing homes. We first design a sample generation method based on MelGAN, then refer to the Mel frequency cepstral coefficient and the Wav2vec2 model to design the sample feature extraction method, perform floating-point operations on the tensor of the extracted features, and then convert it into a waveform. We then design a sample feature classification method based on transfer learning and Vision Transformer. Finally, we obtain the Risevi model. In this paper, we use public datasets and subject data as sample data. The experimental results show that the Risevi model has achieved an accuracy rate of 98.5%, a precision rate of 96.38%, a recall rate of 98.17%, and an F1 score of 97.15%. The experimental results show that the Risevi model can provide practical support for reducing public medical pressure.

show abstract

AudiBERT

Cited by 28 publications

References 24 publications

Exploring the Efficacy and Potential of Large Language Models for Depression: A Systematic Review

Exploring the Efficacy and Potential of Large Language Models for Depression: A Systematic Review

Machine Learning for Multimodal Mental Health Detection: A Systematic Review of Passive Sensing Approaches

Risevi: A Disease Risk Prediction Model Based on Vision Transformer Applied to Nursing Homes

Contact Info

Product

Resources

About