Explore Long-Range Context Features for Speaker Verification

Li, Zhuo; Zhao, Zhenduo; Wang, Qianqian; Zhang, Pengyuan; Zhao, Qinyu

doi:10.3390/app13031340

Cited by 3 publications

(3 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The back-end scoring method then measures the similarity of linguistic representations to determine the language to which the utterance belongs. Recently, numerous studies have shown that an end-to-end approach has integrated these two stages into a single neural module [14][15][16][17][18]. A good language embedding extractor is crucial for robust and high-performance LID systems.…”

Section: Introductionmentioning

confidence: 99%

Branch-Transformer: A Parallel Branch Architecture to Capture Local and Global Features for Language Identification

Li,

Liu,

Fang

et al. 2024

Applied Sciences

View full text Add to dashboard Cite

Currently, an increasing number of people are opting to use transformer models or conformer models for language identification, achieving outstanding results. Among them, transformer models based on self-attention can only capture global information, lacking finer local details. There are also approaches that employ conformer models by concatenating convolutional neural networks and transformers to capture both local and global information. However, this static single-branch architecture is difficult to interpret and modify, and it incurs greater inference difficulty and computational costs compared to dual-branch models. Therefore, in this paper, we propose a novel model called Branch-transformer (B-transformer). In contrast to traditional transformers, it consists of parallel dual-branch structures. One branch utilizes self-attention to capture global information, while the other employs a Convolutional Gated Multi-Layer Perceptron (cgMLP) module to extract local information. We also investigate various fusion methods for integrating global and local information and experimentally validate the effectiveness of our approach on the NIST LRE 2017 dataset.

show abstract

Section: Introductionmentioning

confidence: 99%

Branch-Transformer: A Parallel Branch Architecture to Capture Local and Global Features for Language Identification

Li,

Liu,

Fang

et al. 2024

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…However, despite the temptation to assert linear sequence models as superior, properly testing for information retention from long-context tasks remains callenging. While some works have attempted to evaluate this ability through long contexts (Shaham et al, 2022;Pang et al, 2022;Dong et al, 2024;Bai et al, 2023;Li et al, 2023;Han et al, 2024), whether or not they truly require the use of long-contexts is uncertain and ascertaining long-context abilities from these tasks is difficult. This has prompted the use of more synthetic tasks (Hsieh et al, 2024), such as needle-ina-haystack (NIAH) (Kamradt, 2023) and passkey retreival (Mohtashami and Jaggi, 2023), to better control and evaluate the context sizes of models.…”

Section: Introductionmentioning

confidence: 99%

“…For example, Hsieh et al (2024) claim modern LLMs significantly over-state true context windows on a number of synthetic tasks. Meanwhile Han et al (2024) observe models to perform reasonably well on synthetic tasks, but struggle on real-world tasks, as do Li et al (2023). Hence despite a consistent trend in models behaving underwhelmingly, it remains to be understood why this occurs.…”

Section: Introductionmentioning

confidence: 99%