Purpose of this paperThe purpose of this study is to evaluate freely available machine translation (MT) services' performance in translating metadata records.
Design/methodology/approachRandomly selected metadata records were translated from English into Chinese using Google, Bing, and SYSTRAN Machine Translation (MT) systems. These translations were then evaluated using a five point scale for both Fluency and Adequacy. Missing Count (words not translated) and Incorrect Count (words incorrectly translated) were also recorded.
FindingsConcerning both Fluency and Adequacy, Google and Bing's translations of more than 70% of test data received scores equal to or greater than three, representative of 'non-native Chinese' and 'much coverage,' respectively. SYSTRAN scored lowest in both measures. However, these differences were not statistically significant. A Pearson correlation analysis demonstrated a strong relationship (r= .86) between Fluency and Adequacy. Missing Count and Incorrect Count strongly correlated with Fluency and Adequacy.
Research limitations/implicationsThis study was conducted in a specific domain with a small sample size. It is necessary to conduct the evaluation with a larger, more representative test dataset. Also, other language pairs should be evaluated applying similar technologies.
Originality/valueMost existing digital collections can be accessed in English alone. Few digital collections in the United States support multilingual information access (MLIA) that enables users of differing languages to search, browse, recognize and use information in the collections. Human translation is one solution, but it is neither time nor cost effective for most libraries. This study serves as a first step to understand the 2 performance of current MT systems and to design effective and efficient MLIA services for digital collections.