Sunyam Bagga scite author profile

Sunyam Bagga

5Publications

7Citation Statements Received

72Citation Statements Given

How they've been cited

How they cite others

Affiliations

McGill University, Delhi Technological University

Publications

Order By: Most citations

HATHI 1M: Introducing a Million Page Historical Prose Dataset in English from the Hathi Trust

Bagga

Piper

2022

View full text Add to dashboard Cite

We present a new dataset built on prior work consisting of 1,671,370 randomly sampled pages of English-language prose roughly divided between modes of fictional and non-fictional writing and published between the years 1800 and 2000. In addition to focusing on the "page'' as the basic bibliographic unit, our work employs a single predictive model for the historical period under consideration in contrast to prior work. Besides publication metadata, we also provide an enriched feature set of 107 features including part-of-speech tags, sentiment scores, word supersenses and more. Our data is designed to give researchers in the digital humanities large yet portable random samples of historical writing across two foundational modes of English prose writing. We present initial insights into transformations of linguistic patterns across this historical period using our enriched features as possible pointers to future work. The data can be accessed at https://doi.org/10.7910/DVN/HAKKUA.

show abstract

Opportunistic Self Organizing Migrating Algorithm for real-time Dynamic Traveling Salesman Problem

Dokania

Bagga

Sharma

2017

View full text Add to dashboard Cite

Self Organizing Migrating Algorithm (SOMA) is a meta-heuristic algorithm based on the self-organizing behavior of individuals in a simulated social environment. SOMA performs iterative computations on a population of potential solutions in the given search space to obtain an optimal solution. In this paper, an Opportunistic Self Organizing Migrating Algorithm (OSOMA) has been proposed that introduces a novel strategy to generate perturbations effectively. This strategy allows the individual to span across more possible solutions and thus, is able to produce better solutions. A comprehensive analysis of OSOMA on multi-dimensional unconstrained benchmark test functions is performed. OSOMA is then applied to solve real-time Dynamic Traveling Salesman Problem (DTSP). The problem of real-time DTSP has been stipulated and simulated using real-time data from Google Maps with a varying cost-metric between any two cities. Although DTSP is a very common and intuitive model in the real world, its presence in literature is still very limited. OSOMA performs exceptionally well on the problems mentioned above. To substantiate this claim, the performance of OSOMA is compared with SOMA, Differential Evolution and Particle Swarm Optimization.

show abstract

“Are you kidding me?”: Detecting Unpalatable Questions on Reddit

Bagga¹,

Piper

Ruths

2021

View full text Add to dashboard Cite

Abusive language in online discourse negatively affects a large number of social media users. Many computational methods have been proposed to address this issue of online abuse. The existing work, however, tends to focus on detecting the more explicit forms of abuse leaving the subtler forms of abuse largely untouched. Our work addresses this gap by making three core contributions. First, inspired by the theory of impoliteness, we propose a novel task of detecting a subtler form of abuse, namely unpalatable questions. Second, we publish a context-aware dataset for the task using data from a diverse set of Reddit communities. Third, we implement a wide array of learning models and also investigate the benefits of incorporating conversational context into computational models. Our results show that modeling subtle abuse is feasible but difficult due to the language involved being highly nuanced and context-sensitive. We hope that future research in the field will address such subtle forms of abuse since their harm currently passes unnoticed through existing detection systems.

show abstract

Revisiting Pre-trained Language Models and their Evaluation for Arabic Natural Language Understanding

Ghaddar¹,

Wu²,

Bagga³

et al. 2022

Preprint

View full text Add to dashboard Cite

Toward a Data-Driven Theory of Narrativity

Piper¹,

Bagga²

2022

nlh

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sunyam Bagga

HATHI 1M: Introducing a Million Page Historical Prose Dataset in English from the Hathi Trust

Opportunistic Self Organizing Migrating Algorithm for real-time Dynamic Traveling Salesman Problem

“Are you kidding me?”: Detecting Unpalatable Questions on Reddit

Revisiting Pre-trained Language Models and their Evaluation for Arabic Natural Language Understanding

Toward a Data-Driven Theory of Narrativity

Contact Info

Product

Resources

About