Spanish and Chinese are two very different languages in all language levels. Therefore, translation (both human and machine translation) from one to another and learning one of them as a foreign language are challenging tasks. Some automatic translation systems exist for this pair of languages, but there is enough room to improve the translation quality between Spanish and Chinese. In addition, the accessible sources, such as a parallel corpus for studying and understanding this language pair, are still few. In this paper, we present how we have created a Spanish-Chinese parallel corpus designed for language learning and translation tasks at the discourse level. This corpus has been enriched automatically with part-of-speech (POS) and several queries based on morpho-syntactic information can be realized. We have made available the parallel corpus to the academic community.
In this paper, we introduce a multi-label lazy learning approach to deal with automatic semantic indexing in large document collections in the presence of complex and structured label vocabularies with high inter-label correlation. The proposed method is an evolution of the traditional k-Nearest Neighbors algorithm which uses a large autoencoder trained to map the large label space to a reduced size latent space and to regenerate the predicted labels from this latent space. We have evaluated our proposal in a large portion of the MEDLINE biomedical document collection which uses the Medical Subject Headings (MeSH) thesaurus as a controlled vocabulary. In our experiments we propose and evaluate several document representation approaches and different label autoencoder configurations.
With their huge speaking populations in the world, Spanish and Chinese occupy important positions in linguistic studies. Since the two languages come from different language systems, the translation between Spanish and Chinese is complicated. A comparative study for the language pair can discover the discourse differences between Spanish and Chinese, and can benefit the Spanish-Chinese translation. In this work, based on a Spanish-Chinese parallel corpus annotated with discourse information, we compare the annotation results between the language pair and analyze how discourse affects Spanish-Chinese translation. The research results in our study can help human translators who work with the language pair.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.