Zaid Sheikh scite author profile

Most state-of-the-art models for named entity recognition (NER) rely on the availability of large amounts of labeled data, making them challenging to extend to new, lowerresourced languages. However, there are now several proposed approaches involving either cross-lingual transfer learning, which learns from other highly resourced languages, or active learning, which efficiently selects effective training data based on model predictions. This paper poses the question: given this recent progress, and limited human annotation, what is the most effective method for efficiently creating high-quality entity recognizers in under-resourced languages? Based on extensive experimentation using both simulated and real human annotation, we find a dualstrategy approach best, starting with a crosslingual transferred model, then performing targeted annotation of only uncertain entity spans in the target language, minimizing annotator effort. Results demonstrate that cross-lingual transfer is a powerful tool when very little data can be annotated, but an entity-targeted annotation strategy can achieve competitive accuracy quickly, with just one-tenth of training data. The code is publicly available here. 1

show abstract

Models of tone for tonal and non-tonal languages

Metze

Sheikh

Waibel

et al. 2013

View full text Add to dashboard Cite

Conventional wisdom in automatic speech recognition asserts that pitch information is not helpful in building speech recognizers for non-tonal languages and contributes only modestly to performance in speech recognizers for tonal languages. To maintain consistency between different systems, pitch is therefore often ignored, trading the slight performance benefits for greater system uniformity/ simplicity. In this paper, we report results that challenge this conventional approach. We present new models of tone that deliver consistent performance improvements for tonal languages (Cantonese, Vietnamese) and even modest improvements for non-tonal languages. Using neural networks for feature integration and fusion, these models achieve significant gains throughout, and provide us with system uniformity and standardization across all languages, tonal and non-tonal.

show abstract

Automatic Extraction of Rules Governing Morphological Agreement

Chaudhary

Anastasopoulos

Pratapa

et al. 2020

View full text Add to dashboard Cite

Creating a descriptive grammar of a language is an indispensable step for language documentation and preservation. However, at the same time it is a tedious, time-consuming task. In this paper, we take steps towards automating this process by devising an automated framework for extracting a first-pass grammatical specification from raw text in a concise, human-and machine-readable format. We focus on extracting rules describing agreement, a morphosyntactic phenomenon at the core of the grammars of many of the world's languages. We apply our framework to all languages included in the Universal Dependencies project, with promising results. Using cross-lingual transfer, even with no expert annotations in the language of interest, our framework extracts a grammatical specification which is nearly equivalent to those created with large amounts of gold-standard annotated data. We confirm this finding with human expert evaluations of the rules that our framework produces, which have an average accuracy of 78%. We release an interface demonstrating the extracted rules at https: //neulab.github.io/lase/. The code is publicly available here. 1

show abstract

A neural network based approach for background noise reduction in airborne acoustic emission of a machining process

et al. 2017

View full text Add to dashboard Cite

Energy prediction of a combined cycle power plant using a particle swarm optimization trained feedforward neural network

Rashid

Kamal

Zafar

et al. 2015

View full text Add to dashboard Cite

Semi-supervised training in low-resource ASR and KWS

Metze

Gandhe

Miao

et al. 2015

View full text Add to dashboard Cite

Tool health monitoring for wood milling process using airborne acoustic emission

Zafar

Kamal

Sheikh

et al. 2015

View full text Add to dashboard Cite

Reducing Confusion in Active Learning for Part-Of-Speech Tagging

Chaudhary

Anastasopoulos

Sheikh

et al. 2021

Transactions of the Association for Computational Linguistics

View full text Add to dashboard Cite

Active learning (AL) uses a data selection algorithm to select useful training samples to minimize annotation cost. This is now an essential tool for building low-resource syntactic analyzers such as part-of-speech (POS) taggers. Existing AL heuristics are generally designed on the principle of selecting uncertain yet representative training instances, where annotating these instances may reduce a large number of errors. However, in an empirical study across six typologically diverse languages (German, Swedish, Galician, North Sami, Persian, and Ukrainian), we found the surprising result that even in an oracle scenario where we know the true uncertainty of predictions, these current heuristics are far from optimal. Based on this analysis, we pose the problem of AL as selecting instances that maximally reduce the confusion between particular pairs of output tags. Extensive experimentation on the aforementioned languages shows that our proposed AL strategy outperforms other AL strategies by a significant margin. We also present auxiliary results demonstrating the importance of proper calibration of models, which we ensure through cross-view training, and analysis demonstrating how our proposed strategy selects examples that more closely follow the oracle data distribution. The code is publicly released here. 1

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zaid Sheikh

A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers

Models of tone for tonal and non-tonal languages

Automatic Extraction of Rules Governing Morphological Agreement

A neural network based approach for background noise reduction in airborne acoustic emission of a machining process

Energy prediction of a combined cycle power plant using a particle swarm optimization trained feedforward neural network

Semi-supervised training in low-resource ASR and KWS

Tool health monitoring for wood milling process using airborne acoustic emission

Reducing Confusion in Active Learning for Part-Of-Speech Tagging

Contact Info

Product

Resources

About