Marina Litvak scite author profile

Marina Litvak

5Publications

228Citation Statements Received

61Citation Statements Given

How they've been cited

330

227

How they cite others

Affiliations

Sami Shamoon College of Engineering, Ben-Gurion University of the Negev

Publications

Order By: Most citations

Graph-based keyword extraction for single-document summarization

Litvak

2008

213

116

View full text Add to dashboard Cite

In this paper, we introduce and compare between two novel approaches, supervised and unsupervised, for identifying the keywords to be used in extractive summarization of text documents. Both our approaches are based on the graph-based syntactic representation of text and web documents, which enhances the traditional vector-space model by taking into account some structural document features. In the supervised approach, we train classification algorithms on a summarized collection of documents with the purpose of inducing a keyword identification model. In the unsupervised approach, we run the HITS algorithm on document graphs under the assumption that the top-ranked nodes should represent the document keywords. Our experiments on a collection of benchmark summaries show that given a set of summarized training documents, the supervised classification provides the highest keyword identification accuracy, while the highest F-measure is reached with a simple degree-based ranking. In addition, it is sufficient to perform only the first iteration of HITS rather than running it to its convergence.

show abstract

DegExt — A Language-Independent Graph-Based Keyphrase Extractor

Litvak

Aizenman

Gobits

et al. 2011

View full text Add to dashboard Cite

Abstract. In this paper, we introduce DegExt, a graph-based languageindependent keyphrase extractor,which extends the keyword extraction method described in [6]. We compare DegExt with two state-of-the-art approaches to keyphrase extraction: GenEx [11] and TextRank [8]. Our experiments on a collection of benchmark summaries show that DegExt outperforms TextRank and GenEx in terms of precision and area under curve (AUC) for summaries of 15 keyphrases or more at the expense of a non-significant decrease of recall and F-measure. Moreover, DegExt surpasses both GenEx and TextRank in terms of implementation simplicity and computational complexity.

show abstract

Tracking social media during the COVID-19 pandemic: The case study of lockdown in New York State

Lin

Litvak

2022

Expert Systems with Applications

View full text Add to dashboard Cite

Facing the COVID-19 pandemic, governments have implemented a wide range of policies to contain the spread of the virus. During the pandemic, large amounts of COVID-19-related tweets emerge every day. Real-time processing of daily tweets may offer insights for monitoring public opinion about intervention measures implemented. In this work, lockdown policy in New York State has been set as a target of public opinion research. This task includes two stages, stance detection and opinion monitoring. For the stance detection stage, we explored several combinations of different text representations and classification algorithms, finding that the combination of Long Short-Term Memory (LSTM) with Global Vectors for Word Representation (GloVe) outperforms others. Due to the shortage of labeled data, we adopted the data distillation method for the training data augmentation. The augmentation of the training data allows to improve the performance of the model with a very small amount of manually-labeled data. After applying the distillation method, the accuracy of the model has been significantly improved. Utilizing the enhanced model, automatically classified tweets are analyzed over time to monitor the public opinion. By exploring the tweets in New York from January 22nd until September 30th, 2020, we show the correlation of public opinion with COVID-19 cases and mortality data, and the effect of government responses on the opinion shift. These results demonstrate the capability of the presented method to effectively and efficiently monitor public opinion during a pandemic.

show abstract

Twitter Data Augmentation for Monitoring Public Opinion on COVID-19 Intervention Measures

Lin¹,

Last²,

Litvak³

2020

View full text Add to dashboard Cite

The COVID-19 outbreak is an ongoing worldwide pandemic that was announced as a global health crisis in March 2020. Due to the enormous challenges and high stakes of this pandemic, governments have implemented a wide range of policies aimed at containing the spread of the virus and its negative effect on multiple aspects of our life. Public responses to various intervention measures imposed over time can be explored by analyzing the social media. Due to the shortage of available labeled data for this new and evolving domain, we apply data distillation methodology to labeled datasets from related tasks and a very small manually labeled dataset. Our experimental results show that data distillation outperforms other data augmentation methods on our task.

show abstract

Query-based summarization using MDL principle

Litvak¹,

Vanetik²

2017

View full text Add to dashboard Cite

Query-based text summarization is aimed at extracting essential information that answers the query from original text. The answer is presented in a minimal, often predefined, number of words. In this paper we introduce a new unsupervised approach for query-based extractive summarization, based on the minimum description length (MDL) principle that employs Krimp compression algorithm (Vreeken et al., 2011). The key idea of our approach is to select frequent word sets related to a given query that compress document sentences better and therefore describe the document better. A summary is extracted by selecting sentences that best cover query-related frequent word sets. The approach is evaluated based on the DUC 2005 and DUC 2006 datasets which are specifically designed for query-based summarization (DUC, 2005(DUC, 2006. It competes with the best results.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Marina Litvak

Graph-based keyword extraction for single-document summarization

DegExt — A Language-Independent Graph-Based Keyphrase Extractor

Tracking social media during the COVID-19 pandemic: The case study of lockdown in New York State

Twitter Data Augmentation for Monitoring Public Opinion on COVID-19 Intervention Measures

Query-based summarization using MDL principle

Contact Info

Product

Resources

About