Attenuation Markers of a Candidate Dengue Type 2 Vaccine Virus, Strain 16681 (PDK-53), Are Defined by Mutations in the 5′ Noncoding Region and Nonstructural Proteins 1 and 3

Symbolic sequential data are produced in huge quantities in numerous contexts, such as text and speech data, biometrics, genomics, financial market indexes, music sheets, and online social media posts. In this paper, an unsupervised approach for the chunking of idiomatic units of sequential text data is presented. Text chunking refers to the task of splitting a string of textual information into non-overlapping groups of related units. This is a fundamental problem in numerous fields where understanding the relation between raw units of symbolic sequential data is relevant. Existing methods are based primarily on supervised and semi-supervised learning approaches; however, in this study, a novel unsupervised approach is proposed based on the existing concept of n-grams, which requires no labeled text as an input. The proposed methodology is applied to two natural language corpora: a Wall Street Journal corpus and a Twitter corpus. In both cases, the corpus length was increased gradually to measure the accuracy with a different number of unitary elements as inputs. Both corpora reveal improvements in accuracy proportional with increases in the number of tokens. For the Twitter corpus, the increase in accuracy follows a linear trend. The results show that the proposed methodology can achieve a higher accuracy with incremental usage. A future study will aim at designing an iterative system for the proposed methodology.

show abstract

A computational model implementing subjectivity with the 'Room Theory'. The case of detecting Emotion from Text

Lipizzi¹,

Borrelli²,

Capela³

2020

Preprint

View full text Add to dashboard Cite

Content-Aware Galaxies: Digital Fingerprints of Discussions on Social Media

Babvey

Borrelli

Lipizzi

et al. 2021

IEEE Trans. Comput. Soc. Syst.

View full text Add to dashboard Cite

A Quantitative and Content-Based Approach for Evaluating the Impact of Counter Narratives on Affective Polarization in Online Discussions

Borrelli

Iandoli

Ramírez-Márquez

et al. 2022

IEEE Trans. Comput. Soc. Syst.

View full text Add to dashboard Cite

Pheonix at SemEval-2020 Task 5: Masking the Labels Lubricates Models for Sequence Labeling

Babvey

Borrelli

Zhao

et al. 2020

View full text Add to dashboard Cite

This paper presents the deep-learning model that is submitted to the SemEval-2020 Task 5 competition: "Detecting Counterfactuals". We participated in both Subtask1 and Subtask2. The model proposed in this paper ranked 2nd in Subtask2: "Detecting antecedent and consequence". Our model approaches the task as a sequence labeling. The architecture is built on top of BERT; and a multi-head attention layer with label masking is used to benefit from the mutual information between nearby labels. Also, for prediction, a multi-stage algorithm is used in which the model finalize some predictions with higher certainty in each step and use them in the following. Our results show that masking the labels not only is an efficient regularization method but also improves the accuracy of the model compared with other alternatives like CRF. Label masking can be used as a regularization method in sequence labeling. Also, it improves the performance of the model by learning the specific patterns in the target variable.

show abstract

WINS: Web Interface for Network Science via Natural Language Distributed Representations

Borrelli

Saremi

Vallabhaneni

et al. 2020

View full text Add to dashboard Cite

Correction: Unsupervised acquisition of idiomatic units of symbolic natural language: An n-gram frequency-based approach for the chunking of news articles and tweets

Borrelli¹,

Gongora-Svartzman²,

Lipizzi³

2021

PLoS ONE

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Dario Borrelli

Measuring Polarization in Twitter Enabled in Online Political Conversation: The Case of 2016 US Presidential Election

Unsupervised acquisition of idiomatic units of symbolic natural language: An n-gram frequency-based approach for the chunking of news articles and tweets

A computational model implementing subjectivity with the 'Room Theory'. The case of detecting Emotion from Text

Content-Aware Galaxies: Digital Fingerprints of Discussions on Social Media

A Quantitative and Content-Based Approach for Evaluating the Impact of Counter Narratives on Affective Polarization in Online Discussions

Pheonix at SemEval-2020 Task 5: Masking the Labels Lubricates Models for Sequence Labeling

WINS: Web Interface for Network Science via Natural Language Distributed Representations

Correction: Unsupervised acquisition of idiomatic units of symbolic natural language: An n-gram frequency-based approach for the chunking of news articles and tweets

Contact Info

Product

Resources

About