We investigate the following question for machine translation (MT): can we develop a single universal MT model to serve as the common seed and obtain derivative and improved models on arbitrary language pairs? We propose mRASP, an approach to pre-train a universal multilingual neural machine translation model. Our key idea in mRASP is its novel technique of random aligned substitution, which brings words and phrases with similar meanings across multiple languages closer in the representation space. We pre-train a mRASP model on 32 language pairs jointly with only public datasets. The model is then fine-tuned on downstream language pairs to obtain specialized MT models. We carry out extensive experiments on 42 translation directions across a diverse settings, including low, medium, rich resource, and as well as transferring to exotic language pairs. Experimental results demonstrate that mRASP achieves significant performance improvement compared to directly training on those target pairs. It is the first time to verify that multiple lowresource language pairs can be utilized to improve rich resource MT. Surprisingly, mRASP is even able to improve the translation quality on exotic languages that never occur in the pretraining corpus. Code, data, and pre-trained models are available at https://github. com/linzehui/mRASP. * Equal contribution. The work was done when the first author was an intern at ByteDance.
Existing multilingual machine translation approaches mainly focus on English-centric directions, while the non-English directions still lag behind. In this work, we aim to build a many-to-many translation system with an emphasis on the quality of non-English language directions. Our intuition is based on the hypothesis that a universal cross-language representation leads to better multilingual translation performance. To this end, we propose mRASP2, a training method to obtain a single unified multilingual translation model. mRASP2 is empowered by two techniques: a) a contrastive learning scheme to close the gap among representations of different languages, and b) data augmentation on both multiple parallel and monolingual data to further align token representations. For English-centric directions, mRASP2 outperforms existing best unified model and achieves competitive or even better performance than the pre-trained and fine-tuned model mBART on tens of WMT's translation directions. For non-English directions, mRASP2 achieves an improvement of average 10+ BLEU compared with the multilingual Transformer baseline. Code, data and trained models are available at https://github. com/PANXiao1994/mRASP2.
Existing multilingual machine translation approaches mainly focus on English-centric directions, while the non-English directions still lag behind. In this work, we aim to build a many-to-many translation system with an emphasis on the quality of non-English language directions. Our intuition is based on the hypothesis that a universal cross-language representation leads to better multilingual translation performance. To this end, we propose mCOLT, a training method to obtain a single unified multilingual translation model. mCOLT is empowered by two techniques: a) a contrastive learning scheme to close the gap among representations of different languages, and b) data augmentation on both multiple parallel and monolingual data to further align token representations. For English-centric directions, mCOLT achieves competitive or even better performance than a strong pre-trained model mBART on tens of WMT benchmarks. For non-English directions, mCOLT achieves an improvement of average 10+ BLEU compared with the multilingual baseline 1 .
We investigate the following question for machine translation (MT): can we develop a single universal MT model to serve as the common seed and obtain derivative and improved models on arbitrary language pairs? We propose mRASP, an approach to pre-train a universal multilingual neural machine translation model. Our key idea in mRASP is its novel technique of random aligned substitution, which brings words and phrases with similar meanings across multiple languages closer in the representation space. We pre-train a mRASP model on 32 language pairs jointly with only public datasets. The model is then fine-tuned on downstream language pairs to obtain specialized MT models. We carry out extensive experiments on 42 translation directions across a diverse settings, including low, medium, rich resource, and as well as transferring to exotic language pairs. Experimental results demonstrate that mRASP achieves significant performance improvement compared to directly training on those target pairs. It is the first time to verify that multiple lowresource language pairs can be utilized to improve rich resource MT. Surprisingly, mRASP is even able to improve the translation quality on exotic languages that never occur in the pretraining corpus. Code, data, and pre-trained models are available at https://github. com/linzehui/mRASP. * Equal contribution. The work was done when the first author was an intern at ByteDance.
ObjectivesEvaluating the diagnostic efficiency of deep-learning models to distinguish malignant from benign parotid tumors on plain computed tomography (CT) images.Materials and methodsThe CT images of 283 patients with parotid tumors were enrolled and analyzed retrospectively. Of them, 150 were benign and 133 were malignant according to pathology results. A total of 917 regions of interest of parotid tumors were cropped (456 benign and 461 malignant). Three deep-learning networks (ResNet50, VGG16_bn, and DenseNet169) were used for diagnosis (approximately 3:1 for training and testing). The diagnostic efficiencies (accuracy, sensitivity, specificity, and area under the curve [AUC]) of three networks were calculated and compared based on the 917 images. To simulate the process of human diagnosis, a voting model was developed at the end of the networks and the 283 tumors were classified as benign or malignant. Meanwhile, 917 tumor images were classified by two radiologists (A and B) and original CT images were classified by radiologist B. The diagnostic efficiencies of the three deep-learning network models (after voting) and the two radiologists were calculated.ResultsFor the 917 CT images, ResNet50 presented high accuracy and sensitivity for diagnosing malignant parotid tumors; the accuracy, sensitivity, specificity, and AUC were 90.8%, 91.3%, 90.4%, and 0.96, respectively. For the 283 tumors, the accuracy, sensitivity, and specificity of ResNet50 (after voting) were 92.3%, 93.5% and 91.2%, respectively.ConclusionResNet50 presented high sensitivity in distinguishing malignant from benign parotid tumors on plain CT images; this made it a promising auxiliary diagnostic method to screen malignant parotid tumors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.