Classifying malware correctly is an important research issue for anti-malware software producers. This paper presents an effective and efficient malware classification technique based on string information using several wellknown classification algorithms. In our testing we extracted the printable strings from 1367 samples, including unpacked trojans and viruses and clean files. Information describing the printable strings contained in each sample was input to various classification algorithms, including treebased classifiers, a nearest neighbour algorithm, statistical algorithms and AdaBoost. Using k-fold cross validation on the unpacked malware and clean files, we achieved a classification accuracy of 97%. Our results reveal that strings from library code (rather than malicious code itself) can be utilised to distinguish different malware families.
h i g h l i g h t s• A signature-free malware detection approach has been proposed. • A hybrid wrapper-Filter based malware feature selection has been proposed. • Proposed hybrid approach can take advantages from both filter and wrapper. • Models have also been validated by statistical model selection criteria such as Chi Square and Akaike information criterion (AIC).
a b s t r a c tMalware replicates itself and produces offspring with the same characteristics but different signatures by using code obfuscation techniques. Current generation Anti-Virus (AV) engines employ a signaturetemplate type detection approach where malware can easily evade existing signatures in the database. This reduces the capability of current AV engines in detecting malware. In this paper we propose a hybrid framework for malware detection by using the hybrids of Support Vector Machines Wrapper, MaximumRelevance-Minimum-Redundancy Filter heuristics where Application Program Interface (API) call statistics are used as a malware features. The novelty of our hybrid framework is that it injects the filter's ranking score in the wrapper selection process and combines the properties of both wrapper and filters and API call statistics which can detect malware based on the nature of infectious actions instead of signature. To the best of our knowledge, this kind of hybrid approach has not been explored yet in the literature in the context of feature selection and malware detection. Knowledge about the intrinsic characteristics of malicious activities is determined by the API call statistics which is injected as a filter score into the wrapper's backward elimination process in order to find the most significant APIs. While using the most significant APIs in the wrapper classification on both obfuscated and benign types malware datasets, the results show that the proposed hybrid framework clearly surpasses the existing models including the independent filters and wrappers using only a very compact set of significant APIs. The performances of the proposed and existing models have further been compared using binary logistic regression. Various goodness of fit comparison criteria such as Chi Square, Akaike's Information Criterion (AIC) and Receiver Operating Characteristic Curve ROC are deployed to identify the best performing models. Experimental outcomes based on the above criteria also show that the proposed hybrid framework outperforms other existing models of signature types including independent wrapper and filter approaches to identify malware.
BackgroundThe prevalence of non-communicable diseases is increasing throughout the world, including developing countries.ObjectiveThe intent was to conduct a study of a preventive medical service in a developing country, combining eHealth checkups and teleconsultation as well as assess stratification rules and the short-term effects of intervention.MethodsWe developed an eHealth system that comprises a set of sensor devices in an attaché case, a data transmission system linked to a mobile network, and a data management application. We provided eHealth checkups for the populations of five villages and the employees of five factories/offices in Bangladesh. Individual health condition was automatically categorized into four grades based on international diagnostic standards: green (healthy), yellow (caution), orange (affected), and red (emergent). We provided teleconsultation for orange- and red-grade subjects and we provided teleprescription for these subjects as required.ResultsThe first checkup was provided to 16,741 subjects. After one year, 2361 subjects participated in the second checkup and the systolic blood pressure of these subjects was significantly decreased from an average of 121 mmHg to an average of 116 mmHg (P<.001). Based on these results, we propose a cost-effective method using a machine learning technique (random forest method) using the medical interview, subject profiles, and checkup results as predictor to avoid costly measurements of blood sugar, to ensure sustainability of the program in developing countries.ConclusionsThe results of this study demonstrate the benefits of an eHealth checkup and teleconsultation program as an effective health care system in developing countries.
BanglaLekha-Isolated, a Bangla handwritten isolated character dataset is presented in this article. This dataset contains 84 different characters comprising of 50 Bangla basic characters, 10 Bangla numerals and 24 selected compound characters. 2000 handwriting samples for each of the 84 characters were collected, digitized and pre-processed. After discarding mistakes and scribbles, 1,66,105 handwritten character images were included in the final dataset. The dataset also includes labels indicating the age and the gender of the subjects from whom the samples were collected. This dataset could be used not only for optical handwriting recognition research but also to explore the influence of gender and age on handwriting. The dataset is publicly available at https://data.mendeley.com/datasets/hf6sf8zrkc/2.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.