A large class of entity extraction tasks from text that is either semistructured or fully unstructured may be addressed by regular expressions, because in many practical cases the relevant entities follow an underlying syntactical pattern and this pattern may be described by a regular expression. In this work we consider the long-standing problem of synthesizing such expressions automatically, based solely on examples of the desired behavior. We present the design and implementation of a system capable of addressing extraction tasks of realistic complexity. Our system is based on an evolutionary procedure carefully tailored to the specific needs of regular expression generation by examples. The procedure executes a search driven by a multiobjective optimization strategy aimed at simultaneously improving multiple performance indexes of candidate solutions while at the same time ensuring an adequate exploration of the huge solution space. We assess our proposal experimentally in great depth, on a number of challenging datasets. The accuracy of the obtained solutions seems to be adequate for practical usage and improves over earlier proposals significantly. Most importantly, our results are highly competitive even with respect to human operators. A prototype is available as a web application at http://regex.inginf.units.it
In the north‐western (NW) Mediterranean, the teleosts Diplodus sargus, D. vulgaris and D. annularis coexist in infralittoral habitats. These fishes are infected by two species of the Digenea (Platyhelminthes, Trematoda): Macvicaria crassigula (Opecoelidae) and Monorchis parvus (Monorchiidae) for which we obtained Internal Transcribed Spacer rDNA sequences. Each parasite species represents a complex of two cryptic species, one restricted to D. annularis, and the other shared by D. sargus and D. vulgaris. Cytochrome b mtDNA sequences were used to infer host phylogenetic relationships which showed that the distribution of parasites in Diplodus hosts is not a consequence of coevolutionary interactions. We used diet analyses available for the fish hosts to assess the degree of overlap in the use of food among the three species. The feeding overlap was significant only between D. sargus and D. vulgaris, but not for the other fish pairs. The possible mechanisms involved in the speciation of the digenean fauna of Diplodus fishes are discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.