For a large and evolving software system, the project team could receive many bug reports over a long period of time. It is important to achieve a quantitative understanding of bugfixing time. The ability to predict bug-fixing time can help a project team better estimate software maintenance efforts and better manage software projects. In this paper, we perform an empirical study of bug-fixing time for three CA Technologies projects. We propose a Markov-based method for predicting the number of bugs that will be fixed in future. For a given number of defects, we propose a method for estimating the total amount of time required to fix them based on the empirical distribution of bug-fixing time derived from historical data. For a given bug report, we can also construct a classification model to predict slow or quick fix (e.g., below or above a time threshold). We evaluate our methods using real maintenance data from three CA Technologies projects. The results show that the proposed methods are effective.
The proliferation of malware is a serious threat to computer and information systems throughout the world. Antimalware companies are continually challenged to identify and counter new malware as it is released into the wild. In attempts to speed up this identification and response, many researchers have examined ways to efficiently automate classification of malware as it appears in the environment. In this paper, we present a fast, simple and scalable method of classifying Trojans based only on the lengths of their functions. Our results indicate that function length may play a significant role in classifying malware, and, combined with other features, may result in a fast, inexpensive and scalable method of malware classification.
Classifying malware correctly is an important research issue for anti-malware software producers. This paper presents an effective and efficient malware classification technique based on string information using several wellknown classification algorithms. In our testing we extracted the printable strings from 1367 samples, including unpacked trojans and viruses and clean files. Information describing the printable strings contained in each sample was input to various classification algorithms, including treebased classifiers, a nearest neighbour algorithm, statistical algorithms and AdaBoost. Using k-fold cross validation on the unpacked malware and clean files, we achieved a classification accuracy of 97%. Our results reveal that strings from library code (rather than malicious code itself) can be utilised to distinguish different malware families.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.