Abstract. This paper presents a machine learning approach to the study of translationese. The goal is to train a computer system to distinguish between translated and non-translated text, in order to determine the characteristic features that influence the classifiers. Several algorithms reach up to 97.62% success rate on a technical dataset. Moreover, the SVM classifier consistently reports a statistically significant improved accuracy when the learning system benefits from the addition of simplification features to the basic translational classifier system. Therefore, these findings may be considered an argument for the existence of the Simplification Universal.
When facing new fields, interpreters need to perform extensive searches for specialised knowledge and terminology. They require this information prior to an interpretation and have it accessible during the interpreting service. Fortunately, there are currently several terminology management tools capable of assisting interpreters before and during an interpretation service. Although these tools appear to be quite similar, they provide different kind of features and as a result they exhibit different degrees of usefulness. This paper aims at describing current terminology management tools with a view to establishing a set of features to assess the extent to which terminology tools meet the specific needs of the interpreters. Subsequently, a comparative analysis is performed to evaluate these tools based on the list of features previously identified.
En este trabajo se exploran las posibilidades presentes y futuras que ofrece la lingüística del corpus para los Estudios de Traducción, con especial referencia a la vertiente pedagógica. En la actualidad, la investigación basada en corpus constituye un componente esencial de los sistemas de traducción automática, los programas de extracción terminológica y conceptual, los estudios contrastivos y la caracterización de la lengua traducida. Los dos tipos de corpus más utilizados para tales fines son los comparables y los paralelos. En este artículo, sin embargo, se parte de un corpus ad hoc de textos originales comparables en calidad de macrofuente de documentación para la enseñanza y el ejercicio profesional de la traducción inversa especializada.
This paper describes the system submitted by the University of Wolverhampton and the University of Malaga for SemEval-2015 Task 2: Semantic Textual Similarity. The system uses a Supported Vector Machine approach based on a number of linguistically motivated features. Our system performed satisfactorily for English and obtained a mean 0.7216 Pearson correlation. However, it performed less adequately for Spanish, obtaining only a mean 0.5158.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.