2022
DOI: 10.48550/arxiv.2202.11929
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Word Segmentation on Discovered Phone Units with Dynamic Programming and Self-Supervised Scoring

Abstract: Recent work on unsupervised speech segmentation has used self-supervised models with a phone segmentation module and a word segmentation module that are trained jointly. This paper compares this joint methodology with an older idea: bottom-up phone-like unit discovery is performed first, and symbolic word segmentation is then performed on top of the discovered units (without influencing the lower level). I specifically describe a duration-penalized dynamic programming (DPDP) procedure that can be used for eith… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 43 publications
(103 reference statements)
1
3
0
Order By: Relevance
“…Here, we took the system described in Algayres et al (2022) out of the box, and we showed good performance in speech segmentation compared to the state of the art, but there was still a large margin of improvement compared to text-based system. A recent unpublished paper (Kamper, 2022) came to our attention based on the non-lexical principle and showed similar or slightly better results than ours on a subset of the ZR17 language. Kamper (2022) also uses a segmentation lattice that resembles ours for inference.…”
Section: Conclusion and Open Questionssupporting
confidence: 76%
“…Here, we took the system described in Algayres et al (2022) out of the box, and we showed good performance in speech segmentation compared to the state of the art, but there was still a large margin of improvement compared to text-based system. A recent unpublished paper (Kamper, 2022) came to our attention based on the non-lexical principle and showed similar or slightly better results than ours on a subset of the ZR17 language. Kamper (2022) also uses a segmentation lattice that resembles ours for inference.…”
Section: Conclusion and Open Questionssupporting
confidence: 76%
“…Strictly speaking, new words refer to the types of words that appear first or are used with new meanings. When dealing with texts, a critical problem lies in the "word segmentation" phase, and almost all subsequent results rely on the first segmentation step (Kamper, 2022). Therefore, the accuracy of word segmentations significantly affects the subsequent processing.…”
Section: The Results Of Discovering New Wordsmentioning
confidence: 99%
“…Further, a language model is utilized with beam search to decode the outputs of the acoustic model. Interestingly, the discrete representations enable the unsupervised discovery of acoustic units where phonemes are automatically mapped to a small set of discrete representations, enabling phoneme discovery and segmentation [54][55][56][57]. This resulting property of automatic discovery of ground truth phonemes is of particular interest, as we hypothesize that it allows us to derive the atomic units of human movements from wearable sensor data by learning a mapping of discrete representations to spans of sensor data.…”
Section: Discrete Representations Learning In Other Domainsmentioning
confidence: 99%