Moisés G. de Carvalho scite author profile

Identifying and handling replicas are important to guarantee the quality of the information made available by modern data storage services. There has been a large investment from companies and governments in the development of effective methods for removing replicas from large databases. Typically, this investment has produced significant results, since cleaned replica-free databases not only allow the retrieval of higher-quality information but also lead to a more concise data representation and to potential savings in computational time and resources to process and maintaining this data. In this paper, we propose a GP-based approach to automatic replica identification that combines evidence based on the data content in order to find a similarity function that is able to identify whether two entries in a repository are replicas or not. As shown by our experiments, our approach outperforms an SVM-based method used as baseline by at least 6.5%. Moreover, the suggested functions are computationally less demanding since they use fewer evidence. In addition, our approach is capable to automatically adapt to any given replica identification boundary.

show abstract

An evolutionary approach to complex schema matching

Carvalho

Laender

Gonçalves

et al. 2013

Information Systems

View full text Add to dashboard Cite

An evolutionary approach for combining different sources of evidence in search engines

Silva

Moura

Cavalcanti

et al. 2009

Information Systems

View full text Add to dashboard Cite

A Quantitative Analysis of Learning Objects and Their Metadata in Web Repositories

Carvalho

Guimaraes

et al. 2016

View full text Add to dashboard Cite

Cowpea aphid-borne mosaic virus (CABMV) is widespread in passionfruit in Brazil and causes passionfruit woodiness disease

Nascimento¹,

Santana²,

Braz³

et al. 2006

Arch Virol

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.