Ahsaas Bajaj scite author profile

Ahsaas Bajaj

5Publications

28Citation Statements Received

74Citation Statements Given

How they've been cited

How they cite others

Affiliations

Samsung (India), Netaji Subhas University of Technology

Publications

Order By: Most citations

Long Document Summarization in a Low Resource Setting using Pretrained Language Models

Bajaj¹,

Dangati²,

Krishna³

et al. 2021

View full text Add to dashboard Cite

ive summarization is the task of compressing a long document into a coherent short document while retaining salient information. Modern abstractive summarization methods are based on deep neural networks which often require large training datasets. Since collecting summarization datasets is an expensive and time-consuming task, practical industrial settings are usually low-resource. In this paper, we study a challenging low-resource setting of summarizing long legal briefs with an average source document length of 4268 words and only 120 available (document, summary) pairs. To account for data scarcity, we used a modern pretrained abstractive summarizer BART (Lewis et al., 2020), which only achieves 17.9 ROUGE-L as it struggles with long documents. We thus attempt to compress these long documents by identifying salient sentences in the source which best ground the summary, using a novel algorithm based on GPT-2 (Radford et al., 2019) language model perplexity scores, that operates within the low resource regime. On feeding the compressed documents to BART, we observe a 6.0 ROUGE-L improvement. Our method also beats several competitive salience detection baselines. Furthermore, the identified salient sentences tend to agree with an independent human labeling by domain experts.

show abstract

Entailment and Spectral Clustering based Single and Multiple Document Summarization

Gupta¹,

Kaur²,

Bajaj³

et al. 2019

IJISA

View full text Add to dashboard Cite

Text connectedness is an important feature for content selection in text summarization methods. Recently, Textual Entailment (TE) has been successfully employed to measure sentence connectedness in order to determine sentence salience in single document text summarization. In literature, Analog Textual Entailment and Spectral Clustering (ATESC) is one such method which has used TE to compute inter-sentence connectedness scores. These scores are used to compute salience of sentences and are further utilized by Spectral Clustering algorithm to create segments of sentences. Finally, the most salient sentences are extracted from the most salient segments for inclusion in the final summary. The method has shown good performance earlier. But the authors observe that TE has never been employed for the task of multi-document summarization. Therefore, this paper has proposed ATESC based new methods for the same task. The experiments conducted on DUC 2003 and 2004 datasets reveal that the notion of Textual Entailment along with Spectral Clustering algorithm proves to be an effective duo for redundancy removal and generating informative summaries in multi-document summarization. Moreover, the proposed methods have exhibited faster execution times.

show abstract

Learning Mobile App Embeddings Using Multi-task Neural Network

Bajaj

Krishna

Tiwari

et al. 2019

View full text Add to dashboard Cite

Long Document Summarization in a Low Resource Setting using Pretrained Language Models

Bajaj¹,

Dangati²,

Krishna³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

An Instance Level Approach for Shallow Semantic Parsing in Scientific Procedural Text

Swarup¹,

Bajaj²,

Mysore³

et al. 2020

View full text Add to dashboard Cite

In specific domains, such as procedural scientific text, human labeled data for shallow semantic parsing is especially limited and expensive to create. Fortunately, such specific domains often use rather formulaic writing, such that the different ways of expressing relations in a small number of grammatically similar labeled sentences may provide high coverage of semantic structures in the corpus, through an appropriately rich similarity metric. In light of this opportunity, this paper explores an instance-based approach to the relation prediction sub-task within shallow semantic parsing, in which semantic labels from structurally similar sentences in the training set are copied to test sentences. Candidate similar sentences are retrieved using SciBERT embeddings. For labels where it is possible to copy from a similar sentence we employ an instance level copy network, when this is not possible, a globally shared parametric model is employed. Experiments show our approach outperforms both baseline and prior methods by 0.75 to 3 F1 absolute in the Wet Lab Protocol Corpus and 1 F1 absolute in the Materials Science Procedural Text Corpus.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ahsaas Bajaj

Long Document Summarization in a Low Resource Setting using Pretrained Language Models

Entailment and Spectral Clustering based Single and Multiple Document Summarization

Learning Mobile App Embeddings Using Multi-task Neural Network

Long Document Summarization in a Low Resource Setting using Pretrained Language Models

An Instance Level Approach for Shallow Semantic Parsing in Scientific Procedural Text

Contact Info

Product

Resources

About