“…They usually involve extensive preparation steps such as manual tagging (e.g., [41,42]) and training a specific, designated model (e.g., [38,39,41,43,44]). Approaches to speech segmentation based on acoustic signals alone were proposed in [45,46,40,47]. These efforts have been commonly applied to scripted speech (e.g.…”