Gabriel Pereira Lopes scite author profile

The availability of contiguous and non-contiguous multiword lexical units (MWUs) in Natural Language Processing (NLP) lexica enhances parsing precision, helps attachment decisions, improves indexing in information retrieval (IR) systems, reinforces information extraction (IE) and text mining, among other applications. Unfortunately, their acquisition has long been a significant problem in NLP, IR and IE. In this paper we propose two new association measures, the Symmetric Conditional Probability (SCP) and the Mutual Expectation (ME) for the extraction of contiguous and non-contiguous MWUs. Both measures are used by a new algorithm, the LocalMaxs, that requires neither empirically obtained thresholds nor complex linguistic filters. We assess the results obtained by both measures by comparing them with reference association measures (Specific Mutual Information, φ 2 , Dice and Log-Likelihood coefficients) over a multilingual parallel corpus. An additional experiment has been carried out over a part-of-speech tagged Portuguese corpus for extracting contiguous compound verbs. 10 This corpus corresponds to the news of some days in January 1994 from Lusa (the Portuguese News Agency). 11 Note the spelling error in 'Republica' that should have been written as 'República'. However real corpus is like that and we can not escape from it as there are texts that may reproduce parts of other texts where the graphical form of words does not correspond to currently accepted way of writing. 12 We have discarded hapaxes, every "MWU" or "relevant expression" that occurred just once.

show abstract

Semelparity in a population of Gracilinanus agilis (Didelphimorphia: Didelphidae) inhabiting the Brazilian cerrado

Lopes¹,

Leiner²

2015

Mammalian Biology

View full text Add to dashboard Cite

a b s t r a c tAlthough reproducing once in a lifetime (i.e. semelparity) is considered rare among vertebrates, it has evolved at least five times in two distantly related marsupial families; the Australian Dasyuridae and South American Didelphidae. The major aim of this research was to describe the population dynamics, reproductive strategy and associated life-history traits of the agile gracile mouse opossum, Gracilinanus agilis, in order to position the species along the fast-slow life-history continuum. Sampling was carried out through mark-recapture, from August 2010 to April 2013, in a Brazilian area of cerrado. Reproductive activity was seasonal and synchronized among females, and occurred from July to January/February. After mating, population size decreased due to male disappearance, which seems to be explained by postmating male die-off. Phylogenetic predisposition toward semelparity in Gracilinanus lineage and intense competition for females may contribute to male die-off, as indicated by several evidences such as malebiased sex ratio, signs of aggression in reproductive males, and a pronounced gain in male body mass and size prior to mating. Although two litters were produced, most females disappeared after weaning their young, indicating post-reproductive senescence and resulting in discrete, non-overlapping generations, characterizing semelparity in this population of G. agilis.

show abstract

Clustering Syntactic Positions with Similar Semantic Requirements

Gamallo

Agustini

Lopes

2005

Computational Linguistics

View full text Add to dashboard Cite

This article describes an unsupervised strategy to acquire syntactico-semantic requirements of nouns, verbs, and adjectives from partially parsed text corpora. The linguistic notion of requirement underlying this strategy is based on two specific assumptions. First, it is assumed that two words in a dependency are mutually required. This phenomenon is called here corequirement. Second, it is also claimed that the set of words occurring in similar positions defines extensionally the requirements associated with these positions. The main aim of the learning strategy presented in this article is to identify clusters of similar positions by identifying the words that define their requirements extensionally. This strategy allows us to learn the syntactic and semantic requirements of words in different positions. This information is used to solve attachment ambiguities. Results of this particular task are evaluated at the end of the article. Extensive experimentation was performed on Portuguese text corpora.

show abstract

A Document Descriptor Extractor Based on Relevant Expressions

Silva

Lopes

2009

View full text Add to dashboard Cite

First Steps Towards Coverage-Based Document Alignment

Gomes¹,

Lopes²

2016

View full text Add to dashboard Cite

In this paper we describe a method for selecting pairs of parallel documents (documents that are a translation of each other) from a large collection of documents obtained from the web. Our approach is based on a coverage score that reflects the number of distinct bilingual phrase pairs found in each pair of documents, normalized by the total number of unique phrases found in them. Since parallel documents tend to share more bilingual phrase pairs than non-parallel documents, our alignment algorithm selects pairs of documents with the maximum coverage score from all possible pairings involving either one of the two documents.

show abstract

Measuring Spelling Similarity for Cognate Identification

Gomes

Lopes

2011

View full text Add to dashboard Cite

Using co-composition for acquiring syntactic and semantic subcategorisation

Gamallo

Agustini

Lopes

2002

View full text Add to dashboard Cite

Natural language parsing requires extensive lexicons containing subcategorisation information for specific sublanguages. This paper describes an unsupervised method for acquiring both syntactic and semantic subcategorisation restrictions from corpora. Special attention will be paid to the role of co-composition in the acquisition strategy. The acquired information is used for lexicon tuning and parsing improvement.

show abstract

Using Linked Data in the Data Integration for Maternal and Infant Death Risk of the SUS in the GISSA Project

Freitas

Rocha²,

Braga

et al. 2017

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.