Anoop Sarkar scite author profile

Anoop Sarkar

5Publications

283Citation Statements Received

81Citation Statements Given

How they've been cited

351

278

How they cite others

Affiliations

Simon Fraser University, University of Pennsylvania, Indian Statistical Institute

Publications

Order By: Most citations

Bootstrapping statistical parsers from small datasets

Steedman

Osborne

Sarkar

et al. 2003

View full text Add to dashboard Cite

We present a practical co-training method for bootstrapping statistical parsers using a small amount of manually parsed training material and a much larger pool of raw sentences. Experimental results show that unlabelled sentences can be used to improve the performance of statistical parsers. In addition, we consider the problem of bootstrapping parsers when the manually parsed training material is in a different domain to either the raw sentences or the testing material. We show that bootstrapping continues to be useful, even though no manually produced parses from the target domain are used.

show abstract

Active learning for statistical phrase-based machine translation

Haffari¹,

Roy²,

Sarkar³

2009

View full text Add to dashboard Cite

Statistical machine translation (SMT) models need large bilingual corpora for training, which are unavailable for some language pairs. This paper provides the first serious experimental study of active learning for SMT. We use active learning to improve the quality of a phrase-based SMT system, and show significant improvements in translation compared to a random sentence selection baseline, when test and training data are taken from the same or different domains. Experimental results are shown in a simulated setting using three language pairs, and in a realistic situation for Bangla-English, a language pair with limited translation resources. * We would like to thank Chris Callison-Burch for fruitful discussions.

show abstract

Applying co-training methods to statistical parsing

Sarkar

2001

View full text Add to dashboard Cite

We propose a novel Co-Training method for statistical parsing. The algorithm takes as input a small corpus (9695 sentences) annotated with parse trees, a dictionary of possible lexicalized structures for each word in the training set and a large pool of unlabeled text. The algorithm iteratively labels the entire data set with parse trees. Using empirical results based on parsing the Wall Street Journal corpus we show that training a statistical parser on the combined labeled and unlabeled data strongly outperforms training only on the labeled data.

show abstract

Prediction Improves Simultaneous Neural Machine Translation

Alinejad¹,

Siahbani²,

Sarkar³

2018

View full text Add to dashboard Cite

Simultaneous speech translation aims to maintain translation quality while minimizing the delay between reading input and incrementally producing the output. We propose a new general-purpose prediction action which predicts future words in the input to improve quality and minimize delay in simultaneous translation. We train this agent using reinforcement learning with a novel reward function. Our agent with prediction has better translation quality and less delay compared to an agent-based simultaneous translation system without prediction.

show abstract

Automatic extraction of subcategorization frames for Czech

Sarkar

Zeman

2000

View full text Add to dashboard Cite

We present some novel nmchine learning techniques for the identilication of subcategorization infornmtion for verbs in Czech. We compare three different statistical techniques applied to this problem. We show how the learning algorithm can be used to discover previously unknown subcategorization frames from the Czech Prague 1)ependency Treebank. The algorithm can then be used to label dependents of a verb in the Czech treebank as either arguments or adjuncts. Using our techniques, we are able to achieve 88% precision on unseen parsed text.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Anoop Sarkar

Bootstrapping statistical parsers from small datasets

Active learning for statistical phrase-based machine translation

Applying co-training methods to statistical parsing

Prediction Improves Simultaneous Neural Machine Translation

Automatic extraction of subcategorization frames for Czech

Contact Info

Product

Resources

About