This paper describes KIT'submission to the IWSLT 2021 Offline Speech Translation Task. We describe a system in both cascaded condition and end-to-end condition. In the cascaded condition, we investigated different endto-end architectures for the speech recognition module. For the text segmentation module, we trained a small transformer-based model on high-quality monolingual data. For the translation module, our last year's neural machine translation model was reused. In the end-toend condition, we improved our Speech Relative Transformer architecture to reach or even surpass the result of the cascade system.
This paper studies the problem of analyzing multisatellite constellations with respect to their coverage capacity of areas on Earth's surface. The geometric configuration of constellation projection points on Earth's surface is investigated. A geometric subdivision approach is described, and the coverage target area belonging to each satellite and its maximum circle radius are defined and calculated. Accordingly, the target area can be decomposed into subregions, and thus the multisatellite coverage problem is decomposed into a one-satellite coverage problem. An accurate and effective solution method is proposed that solves both continuous and discontinuous coverage problems for any type of ground area. In addition, a procedure for calculating satellite orbital parameters is also proposed. The performance of our approach is analyzed using the Globalstar system as an example, and it is shown that it compares favorably with the classical grid-point technique and the longitude method.
In this paper, we describe our submission to the Simultaneous Speech Translation at IWSLT 2022. We explore strategies to utilize an offline model in a simultaneous setting without the need to modify the original model. In our experiments, we show that our onlinization algorithm is almost on par with the offline setting while being 3× faster than offline in terms of latency on the test set. We also show that the onlinized offline model outperforms the best IWSLT2021 simultaneous system in medium and high latency regimes and is almost on par in the low latency regime. We make our system publicly available. 1
Pretrained models in acoustic and textual modalities can potentially improve speech translation for both Cascade and End-to-end approaches. In this evaluation, we aim at empirically looking for the answer by using the wav2vec, mBART50 and DeltaLM models to improve text and speech translation models. The experiments showed that the presence of these models together with an advanced audio segmentation method results in an improvement over the previous End-to-end system by up to 7 BLEU points. More importantly, the experiments showed that given enough data and modeling capacity to overcome the training difficulty, we can outperform even very competitive Cascade systems. In our experiments, this gap can be as large as 2.0 BLEU points, the same gap that the Cascade often led over the years.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.