A large class of entity extraction tasks from text that is either semistructured or fully unstructured may be addressed by regular expressions, because in many practical cases the relevant entities follow an underlying syntactical pattern and this pattern may be described by a regular expression. In this work we consider the long-standing problem of synthesizing such expressions automatically, based solely on examples of the desired behavior. We present the design and implementation of a system capable of addressing extraction tasks of realistic complexity. Our system is based on an evolutionary procedure carefully tailored to the specific needs of regular expression generation by examples. The procedure executes a search driven by a multiobjective optimization strategy aimed at simultaneously improving multiple performance indexes of candidate solutions while at the same time ensuring an adequate exploration of the huge solution space. We assess our proposal experimentally in great depth, on a number of challenging datasets. The accuracy of the obtained solutions seems to be adequate for practical usage and improves over earlier proposals significantly. Most importantly, our results are highly competitive even with respect to human operators. A prototype is available as a web application at http://regex.inginf.units.it
Abstract-With the wide diffusion of smartphones and their usage in a plethora of processes and activities, these devices have been handling an increasing variety of sensitive resources. Attackers are hence producing a large number of malware applications for Android (the most spread mobile platform), often by slightly modifying existing applications, which results in malware being organized in families.Some works in the literature showed that opcodes are informative for detecting malware, not only in the Android platform. In this paper, we investigate if frequencies of ngrams of opcodes are effective in detecting Android malware and if there is some significant malware family for which they are more or less effective. To this end, we designed a method based on state-of-the-art classifiers applied to frequencies of opcodes ngrams. Then, we experimentally evaluated it on a recent dataset composed of 11120 applications, 5560 of which are malware belonging to several different families.Results show that an accuracy of 97% can be obtained on the average, whereas perfect detection rate is achieved for more than one malware family.
We propose a system for the automatic generation of regular expressions for text-extraction tasks. The user describes the desired task only by means of a set of labeled examples. The generated regexes may be used with common engines such as those that are part of Java, PHP, Perl and so on. Usage of the system does not require any familiarity with regular expressions syntax. We performed an extensive experimental evaluation on 12 different extraction tasks applied to realworld datasets. We obtained very good results in terms of precision and recall, even in comparison to earlier state-of-the-art proposals. Our results are highly promising toward the achievement of a practical surrogate for the specific skills required for generating regular expressions, and significant as a demonstration of what can be achieved with GP-based approaches on modern IT technology. schemas, extraction of bibliographic citations, network packets rewriting, network traffic classification, signal processing hardware design, malware and phishing detection and so on.Constructing a regular expression suitable for a specific task is a tedious and error-prone process, which requires specialized skills including familiarity with the formalism used by practical engines. For this reason, several approaches for generating regular expressions automatically have been proposed in the literature, with varying degrees of practical applicability (see Section 2 for a detailed discussion). In this work we focus on text extraction tasks and describe the design, implementation and experimental evaluation of a system for the automatic generation of regular expressions from examples. The user is required to describe the desired task by providing a set of examples, in the form of strings in which each string is accompanied by the (possibly empty) substring to be extracted. Based on these examples, the system generates a regular expression suitable for use with widespread and popular engines such as libraries of Java, PHP, Perl and so on. The system is internally based on multi-objective Genetic Programming (GP): GP is a computational paradigm inspired by biological evolution [1]. We remark that all the user has to provide is a set of examples. In particular, the user need not provide any initial regular expression or hints about structure or symbols of the target expression. Usage of the system, thus, requires neither familiarity with GP nor with regular expressions syntax.We performed an extensive experimental evaluation of our proposal on 12 different extraction tasks: email addresses, IP addresses, MAC (Ethernet card-level) addresses, web URLs, HTML headings, Italian Social Security Numbers, phone numbers, HREF attributes, Twitter hashtags and citations. All these datasets were not generated synthetically, except for one: the Italian Social Security Numbers dataset. We obtained very good results for precision and recall in all the experiments. Some of these datasets were used by earlier state-of-the-art proposals and our results compare very favorably even ...
Laser speckle contrast imaging identifies endothelial-dependent and endothelial-independent microvascular dysfunction in individuals presenting with EOCAD, and thus could be valuable as an early peripheral marker of atherothrombotic disease.
The recently described severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has infected millions of people, with thousands of fatalities. It has prompted global efforts in research, with focus on the pathophysiology of coronavirus disease-19 (COVID-19), and a rapid surge of publications. COVID-19 has been associated with a myriad of clinical manifestations, including the lungs, heart, kidneys, central nervous system, gastrointestinal system, skin, and blood coagulation abnormalities. The endothelium plays a key role in organ dysfunction associated with severe infection, and current data suggest that it is also involved in SARS-CoV-2-induced sepsis. This critical review aimed to address a possible unifying mechanism underlying the diverse complications of COVID-19: microvascular dysfunction, with emphasis on the renin-angiotensin system. In addition, research perspectives are suggested in order to expand understanding of the pathophysiology of the infection.
Background Microvascular dysfunction, serum cytokines and chemokines may play important roles in pathophysiology of coronavirus disease 2019 (COVID-19), especially in severe cases. Methods Patients with COVID-19 underwent non-invasive evaluation of systemic endothelium-dependent microvascular reactivity - using laser Doppler perfusion monitoring in the skin of the forearm - coupled to local thermal hyperemia. Maximal microvascular vasodilatation (44° C thermal plateau phase) was used as endpoint. A multiplex biometric immunoassay was used to assess a panel of 48 serum cytokines and chemokines. Severe COVID-19 (S-COVID) was defined according to WHO criteria, while all other cases of COVID-19 were considered mild to moderate (M-COVID). A group of healthy individuals who tested negative for SARS-CoV-2 served as a control group and was also evaluated with LDPM. Results Thirty-two patients with COVID-19 (25% S-COVID) and 14 controls were included. Basal microvascular flow was similar between M-COVID and controls (P=0.69) but was higher in S-COVID than in controls (P=0.005) and M-COVID patients (P=0.01). The peak microvascular vasodilator response was markedly decreased in both patient groups (M-COVID, P=0.001; S-COVID, P<0.0001) compared to the healthy group. The percent increases in microvascular flow were markedly reduced in both patient groups (M-COVID, P<0.0001; S-COVID, P<0.0001) compared to controls. Patients with S-COVID had markedly higher concentrations of dissimilar proinflammatory cytokines and chemokines, compared to patients with M-COVID. Conclusions In patients with COVID-19, especially with S-COVID, endothelium-dependent microvascular vasodilator responses are reduced, while serum cytokines and chemokines involved in the regulation of vascular function and inflammation are increased.
Oral care is frequently suboptimal in children from developing countries, especially those suffering from severe systemic diseases. The aim of the present study was to analyze the oral epidemiological profile of 3-to-5-year-old children with congenital heart disease. Dental and medical records of children evaluated at the Dental Service of the National Institute of Cardiology, Rio de Janeiro, Brazil, were reviewed. Caries experience was reported using the dmft index. Negative behavior towards dental management was recorded. The sample consisted of 144 children aged 4.41 ± 0.95 years. The mean dmft value was 5.4 ± 4.9, and 80.5% had at least one caries lesion. Dmft index was greater in the presence of cyanotic cardiac disease and in children with negative behavior. An increase in the "missing" component of the dmft index was also found in children using medicine on a daily basis. A higher caries experience was associated with children whose fathers had only an elementary education. In conclusion, children with congenital heart disease had high levels of caries experience at a young age. Cyanosis, negative behavior, daily use of medicine, one-parent family and the educational level of fathers seem to influence caries experience in children with congenital cardiac disease.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.