Masoud Jalili Sabet scite author profile

Masoud Jalili Sabet

5Publications

41Citation Statements Received

58Citation Statements Given

How they've been cited

How they cite others

106

Affiliations

Ludwig-Maximilians-Universität München, University of Tehran

Publications

Order By: Most citations

SimAlign: High Quality Word Alignments Without Parallel Training Data Using Static and Contextualized Embeddings

Sabet¹,

Dufter²,

Yvon³

et al. 2020

View full text Add to dashboard Cite

Word alignments are useful for tasks like statistical and neural machine translation (NMT) and cross-lingual annotation projection. Statistical word aligners perform well, as do methods that extract alignments jointly with translations in NMT. However, most approaches require parallel training data, and quality decreases as less training data is available. We propose word alignment methods that require no parallel data. The key idea is to leverage multilingual word embeddings -both static and contextualized -for word alignment. Our multilingual embeddings are created from monolingual data only without relying on any parallel data or dictionaries. We find that alignments created from embeddings are superior for four and comparable for two language pairs compared to those produced by traditional statistical aligners -even with abundant parallel data; e.g., contextualized embeddings achieve a word alignment F 1 for English-German that is 5 percentage points higher than eflomal, a high-quality statistical aligner, trained on 100k parallel sentences.

show abstract

ParCourE: A Parallel Corpus Explorer for a Massively Multilingual Corpus

ImaniGooghari¹,

Sabet²,

Dufter³

et al. 2021

View full text Add to dashboard Cite

With more than 7000 languages worldwide, multilingual natural language processing (NLP) is essential both from an academic and commercial perspective. Researching typological properties of languages is fundamental for progress in multilingual NLP. Examples include assessing language similarity for effective transfer learning, injecting inductive biases into machine learning models or creating resources such as dictionaries and inflection tables. We provide ParCourE, an online tool that allows to browse a word-aligned parallel corpus, covering 1334 languages. We give evidence that this is useful for typological research. ParCourE can be set up for any parallel corpus and can thus be used for typological research on other corpora as well as for exploring their quality and properties.

show abstract

Aligning Very Small Parallel Corpora Using Cross-Lingual Word Embeddings and a Monogamy Objective

Poerner¹,

Sabet²,

Roth³

et al. 2018

Preprint

View full text Add to dashboard Cite

Automatic translation memory cleaning

Negri

Ataman

Sabet

et al. 2017

Machine Translation

View full text Add to dashboard Cite

CaMEL: Case Marker Extraction without Labels

Leonie¹,

Hofmann²,

Sabet³

et al. 2022

View full text Add to dashboard Cite

We introduce CaMEL (Case Marker Extraction without Labels), a novel and challenging task in computational morphology that is especially relevant for low-resource languages. We propose a first model for CaMEL that uses a massively multilingual corpus to extract case markers in 83 languages based only on a noun phrase chunker and an alignment system. To evaluate CaMEL, we automatically construct a silver standard from UniMorph. The case markers extracted by our model can be used to detect and visualise similarities and differences between the case systems of different languages as well as to annotate fine-grained deep cases in languages in which they are not overtly marked.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.