This paper presents the results of the shared tasks from the 6th workshop on Asian translation (WAT2019) including Ja↔En, Ja↔Zh scientific paper translation subtasks, Ja↔En, Ja↔Ko, Ja↔En patent translation subtasks, Hi↔En, My↔En, Km↔En, Ta↔En mixed domain subtasks, Ru↔Ja news commentary translation task, and En→Hi multi-modal translation task. For the WAT2019, 25 teams participated in the shared tasks. We also received 10 research paper submissions out of which 7 1 were accepted. About 400 translation results were submitted to the automatic evaluation server, and selected submissions were manually evaluated.
This paper presents the results of the shared tasks from the 8th workshop on Asian translation (WAT2021). For the WAT2021, 28 teams participated in the shared tasks and 24 teams submitted their translation results for the human evaluation. We also accepted 5 research papers. About 2,100 translation results were submitted to the automatic evaluation server, and selected submissions were manually evaluated.
Experiments on various word segmentation approaches for the Burmese language are conducted and discussed in this note. Specifically, dictionary-based, statistical, and machine learning approaches are tested. Experimental results demonstrate that statistical and machine learning approaches perform significantly better than dictionary-based approaches. We believe that this note, based on an annotated corpus of relatively considerable size (containing approximately a half million words), is the first systematic comparison of word segmentation approaches for Burmese. This work aims to discover the properties and proper approaches to Burmese textual processing and to promote further researches on this understudied language.
Pine wilt disease (PWD) is a serious threat to pine forests. Combining unmanned aerial vehicle (UAV) images and deep learning (DL) techniques to identify infected pines is the most efficient method to determine the potential spread of PWD over a large area. In particular, image segmentation using DL obtains the detailed shape and size of infected pines to assess the disease’s degree of damage. However, the performance of such segmentation models has not been thoroughly studied. We used a fixed-wing UAV to collect images from a pine forest in Laoshan, Qingdao, China, and conducted a ground survey to collect samples of infected pines and construct prior knowledge to interpret the images. Then, training and test sets were annotated on selected images, and we obtained 2352 samples of infected pines annotated over different backgrounds. Finally, high-performance DL models (e.g., fully convolutional networks for semantic segmentation, DeepLabv3+, and PSPNet) were trained and evaluated. The results demonstrated that focal loss provided a higher accuracy and a finer boundary than Dice loss, with the average intersection over union (IoU) for all models increasing from 0.656 to 0.701. From the evaluated models, DeepLLabv3+ achieved the highest IoU and an F1 score of 0.720 and 0.832, respectively. Also, an atrous spatial pyramid pooling module encoded multiscale context information, and the encoder–decoder architecture recovered location/spatial information, being the best architecture for segmenting trees infected by the PWD. Furthermore, segmentation accuracy did not improve as the depth of the backbone network increased, and neither ResNet34 nor ResNet50 was the appropriate backbone for most segmentation models.
Identification of bacterial type III secreted effectors (T3SEs) has become a popular research topic in the field of bioinformatics due to its crucial role in understanding host-pathogen interaction and developing better therapeutic targets against the pathogens. However, the recognition of all effector proteins by using traditional experimental approaches is often time-consuming and laborious. Therefore, development of computational methods to accurately predict putative novel effectors is important in reducing the number of biological experiments for validation. In this study, we proposed a method, called iT3SE-PX, to identify T3SEs solely based on protein sequences. First, three kinds of features were extracted from the position-specific scoring matrix (PSSM) profiles to help train a machine learning (ML) model. Then, the extreme gradient boosting (XGBoost) algorithm was performed to rank these features based on their classification ability. Finally, the optimal features were selected as inputs to a support vector machine (SVM) classifier to predict T3SEs. Based on the two benchmark datasets, we conducted a 100-time randomized 5-fold cross validation (CV) and an independent test, respectively. The experimental results demonstrated that the proposed method achieved superior performance compared to most of the existing methods and could serve as a useful tool for identifying putative T3SEs, given only the sequence information.
A rule-based pre-ordering approach is proposed for statistical Japanese-to-English machine translation using the dependency structure of source-side sentences. A Japanese sentence is pre-ordered to an English-like order at the morpheme level for a statistical machine translation system during the training and decoding phase to resolve the reordering problem. In this article,
extra-chunk
pre-ordering of morphemes is proposed, which allows Japanese functional morphemes to move across chunk boundaries. This contrasts with the intra-chunk reordering used in previous approaches, which restricts the reordering of morphemes within a chunk. Linguistically oriented discussions show that correct pre-ordering cannot be realized without extra-chunk movement of morphemes. The proposed approach is compared with five rule-based pre-ordering approaches designed for Japanese-to-English translation and with a language independent statistical pre-ordering approach on a standard patent dataset and on a news dataset obtained by crawling Internet news sites. Two state-of-the-art statistical machine translation systems, one phrase-based and the other hierarchical phrase-based, are used in experiments. Experimental results show that the proposed approach outperforms the compared approaches on automatic reordering measures (Kendall’s τ, Spearman’s ρ, fuzzy reordering score, and test set RIBES) and on the automatic translation precision measure of test set BLEU score.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.