Aldrian Obaja Muis scite author profile

Aldrian Obaja Muis

5Publications

92Citation Statements Received

110Citation Statements Given

How they've been cited

228

How they cite others

109

Affiliations

Singapore University of Technology and Design

Publications

Order By: Most citations

Labeling Gaps Between Words: Recognizing Overlapping Mentions with Mention Separators

Muis¹,

Lu²

2017

101

View full text Add to dashboard Cite

In this paper, we propose a new model that is capable of recognizing overlapping mentions. We introduce a novel notion of mention separators that can be effectively used to capture how mentions overlap with one another. On top of a novel multigraph representation that we introduce, we show that efficient and exact inference can still be performed. We present some theoretical analysis on the differences between our model and a recently proposed model for recognizing overlapping mentions, and discuss the possible implications of the differences. Through extensive empirical analysis on standard datasets, we demonstrate the effectiveness of our approach.

show abstract

Learning to Recognize Discontiguous Entities

Muis¹,

Lu²

2016

View full text Add to dashboard Cite

This paper focuses on the study of recognizing discontiguous entities. Motivated by a previous work, we propose to use a novel hypergraph representation to jointly encode discontiguous entities of unbounded length, which can overlap with one another. To compare with existing approaches, we first formally introduce the notion of model ambiguity, which defines the difficulty level of interpreting the outputs of a model, and then formally analyze the theoretical advantages of our model over previous existing approaches based on linearchain CRFs. Our empirical results also show that our model is able to achieve significantly better results when evaluated on standard data with many discontiguous entities. AbstractThis is the supplementary material for "Learning to Recognize Discontiguous Entities" [Muis and Lu, 2016]. This material gives more details in the experiment setup, the ambiguity of each model, and compare the models from theoretical point of view.

show abstract

MalwareTextDB: A Database for Annotated Malware Articles

Lim¹,

Muis²,

Lu³

et al. 2017

View full text Add to dashboard Cite

Cybersecurity risks and malware threats are becoming increasingly dangerous and common. Despite the severity of the problem, there has been few NLP efforts focused on tackling cybersecurity.In this paper, we discuss the construction of a new database for annotated malware texts. An annotation framework is introduced based around the MAEC vocabulary for defining malware characteristics, along with a database consisting of 39 annotated APT reports with a total of 6,819 sentences. We also use the database to construct models that can potentially help cybersecurity researchers in their data collection and analytics efforts.

show abstract

Efficient Dependency-Guided Named Entity Recognition

Jie

Muis

2017

AAAI

View full text Add to dashboard Cite

Named entity recognition (NER), which focuses on the extraction of semantically meaningful named entities and their semantic classes from text, serves as an indispensable component for several down-stream natural language processing (NLP) tasks such as relation extraction and event extraction. Dependency trees, on the other hand, also convey crucial semantic-level information. It has been shown previously that such information can be used to improve the performance of NER. In this work, we investigate on how to better utilize the structured information conveyed by dependency trees to improve the performance of NER. Specifically, unlike existing approaches which only exploit dependency information for designing local features, we show that certain global structured information of the dependency trees can be exploited when building NER models where such information can provide guided learning and inference. Through extensive experiments, we show that our proposed novel dependency-guided NER model performs competitively with models based on conventional semi-Markov conditional random fields, while requiring significantly less running time.

show abstract

Weak Semi-Markov CRFs for Noun Phrase Chunking in Informal Text

Muis

2016

View full text Add to dashboard Cite

This paper introduces a new annotated corpus based on an existing informal text corpus: the NUS SMS Corpus (Chen and Kan, 2013). The new corpus includes 76,490 noun phrases from 26,500 SMS messages, annotated by university students. We then explored several graphical models, including a novel variant of the semi-Markov conditional random fields (semi-CRF) for the task of noun phrase chunking. We demonstrated through empirical evaluations on the new dataset that the new variant yielded similar accuracy but ran in significantly lower running time compared to the conventional semi-CRF.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.