Xie, Ruoyu scite author profile

An ongoing challenge in current natural language processing is how its major advancements tend to disproportionately favor resource-rich languages, leaving a significant number of under-resourced languages behind. Due to the lack of resources required to train and evaluate models, most modern language technologies are either nonexistent or unreliable to process endangered, local, and nonstandardized languages. Optical character recognition (OCR) is often used to convert endangered language documents into machinereadable data. However, such OCR output is typically noisy, and most word alignment models are not built to work under such noisy conditions. In this work, we study the existing word-level alignment models under noisy settings and aim to make them more robust to noisy data. Our noise simulation and structural biasing method, tested on multiple language pairs, manages to reduce alignment error rate on a state-of-the-art neural-based alignment model up to 59.6%. 1

show abstract

GMNLP at SemEval-2023 Task 12: Sentiment Analysis with Phylogeny-Based Adapters

Ibn¹,

Ruoyu²,

Faisal³

et al. 2023

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xie, Ruoyu

Synergy between Fenton process and DBD for methyl orange degradation

Noisy Parallel Data Alignment

GMNLP at SemEval-2023 Task 12: Sentiment Analysis with Phylogeny-Based Adapters

Contact Info

Product

Resources

About