Yoav Katz scite author profile

Real world scenarios present a challenge for text classification, since labels are usually expensive and the data is often characterized by class imbalance. Active Learning (AL) is a ubiquitous paradigm to cope with data scarcity. Recently, pre-trained NLP models, and BERT in particular, are receiving massive attention due to their outstanding performance in various NLP tasks. However, the use of AL with deep pre-trained models has so far received little consideration. Here, we present a large-scale empirical study on active learning techniques for BERT-based classification, addressing a diverse set of AL strategies and datasets. We focus on practical scenarios of binary text classification, where the annotation budget is very small, and the data is often skewed. Our results demonstrate that AL can boost BERT performance, especially in the most realistic scenario in which the initial set of labeled examples is created using keyword-based queries, resulting in a biased sample of the minority class. We release our research framework, aiming to facilitate future research along the lines explored here.

show abstract

An autonomous debating system

Slonim

Bilu

Alzate

et al. 2021

Nature

147

View full text Add to dashboard Cite

Learning to combine Grammatical Error Corrections

Kantor¹,

Katz²,

Choshen³

et al. 2019

View full text Add to dashboard Cite

The field of Grammatical Error Correction (GEC) has produced various systems to deal with focused phenomena or general text editing. We propose an automatic way to combine black-box systems. Our method automatically detects the strength of a system or the combination of several systems per error type, improving precision and recall while optimizing F score directly. We show consistent improvement over the best standalone system in all the configurations tested. This approach also outperforms average ensembling of different RNN models with random initializations.In addition, we analyze the use of BERT for GEC -reporting promising results on this end. We also present a spellchecker created for this task which outperforms standard spellcheckers tested on the task of spellchecking. This paper describes a system submission to Building Educational Applications 2019 Shared Task:Grammatical Error Correction (Bryant et al., 2019).Combining the output of top BEA 2019 shared task systems using our approach, currently holds the highest reported score in the open phase of the BEA 2019 shared task, improving F 0.5 by 3.7 points over the best result reported.

show abstract

Throughput maximization of real-time scheduling with batching

Bar-Noy

Guha

Katz

et al. 2009

ACM Trans. Algorithms

View full text Add to dashboard Cite

We consider the following scheduling with batching problem that has many applications, for example, in multimedia-on-demand and manufacturing of integrated circuits. The input to the problem consists of n jobs and k parallel machines. Each job is associated with a set of time intervals A. BAR-NOY ET AL.in which it can be scheduled (given either explicitly or nonexplicitly), a weight, and a family. Each family is associated with a processing time. Jobs that belong to the same family can be batched and executed together on the same machine. The processing time of each batch is the processing time of the family of jobs it contains. The goal is to find a nonpreemptive schedule with batching that maximizes the weight of the scheduled jobs. We give constant factor (4 or 4 + ε) approximation algorithms for two variants of the problem, depending on the precise representation of the input. When the batch size is unbounded and each job is associated with a time window in which it can be processed, these approximation ratios reduce to 2 and 2 + ε, respectively. We also give approximation algorithms for two special cases when all release times are the same.

show abstract

X-Gen: a random test-case generator for systems and SoCs

Emek

Jaeger

Naveh

et al.

View full text Add to dashboard Cite

Learning microarchitectural behaviors to improve stimuli generation quality

Katz

Rimon

Ziv

et al. 2011

View full text Add to dashboard Cite

Overview of the 2021 Key Point Analysis Shared Task

Friedman¹,

Dankin²,

Hou³

et al. 2021

View full text Add to dashboard Cite

show abstract

YASO: A Targeted Sentiment Analysis Evaluation Dataset for Open-Domain Reviews

Orbach

Toledo-Ronen

Spector

et al. 2021

View full text Add to dashboard Cite

Current TSA evaluation in a cross-domain setup is restricted to the small set of review domains available in existing datasets. Such an evaluation is limited, and may not reflect true performance on sites like Amazon or Yelp that host diverse reviews from many domains. To address this gap, we present YASO -a new TSA evaluation dataset of open-domain user reviews. YASO contains 2215 English sentences from dozens of review domains, annotated with target terms and their sentiment. Our analysis verifies the reliability of these annotations, and explores the characteristics of the collected data. Benchmark results using five contemporary TSA systems show there is ample room for improvement on this challenging new dataset. YASO is available at github.com/IBM/yaso-tsa.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yoav Katz

Active Learning for BERT: An Empirical Study

An autonomous debating system

Learning to combine Grammatical Error Corrections

Throughput maximization of real-time scheduling with batching

X-Gen: a random test-case generator for systems and SoCs

Learning microarchitectural behaviors to improve stimuli generation quality

Overview of the 2021 Key Point Analysis Shared Task

YASO: A Targeted Sentiment Analysis Evaluation Dataset for Open-Domain Reviews

Contact Info

Product

Resources

About