Articles you may be interested inQuantum control of a model qubit based on a multi-layered quantum dot Abstract. In this paper, we propose a hybrid semantic and statistics approach for spam email detection. An adaptive training scheme is implemented, which does not require a large training pool. Experimental results have shown promising performance.
INTRODUCTIONUnsolicited e-mail has become the bane of modern life. The impact of spam emails on business, government, and general activity ranges from irritating to severely deleterious. Spam has two main effects: first, it clogs the Internet, reducing effective throughput and slowing the transmission of genuine communications; second, there is significant aggregate time wasted in deleting such e-mails. Whilst it may only take a few seconds for one individual to remove one spam e-mail, the total time squandered when this same e-mail is deleted by millions of recipients becomes significant, with consequent reduction in productivity and increasingly drain on IT resources. In an attempt to combat the effect of spam, numerous email filters are currently deployed on most operating systems and ISPs. Tuning these spam filters is a complex task, as the impact of a false positive, and the consequent blocking of the bona fide e-mail, has a much greater impact on the individual user than a false negative and the subsequent transmission of a spam e-mail. End users are hence reluctant to use overly aggressive spam filters, regardless of the overall impact on Internet traffic. Significant gains in global efficiency could be provided by a highly accurate spam filter.This paper introduces a hybrid approach to the spam problem, utilising a combined semantic and statistical text based feature extraction methodology, it is coupled with three multilayer feedforward back propagation neural networks operating in a combined voting architecture. A level of self training was implemented using high confidence unanimous decisions to further develop the network beyond its initial training set. Significantly, good accuracy was achieved on previously unseen e-mail test cases using only 11 features and an initial training set of only 200 elements. The adaptive nature of this system coupled with the small initial training set suggests that a full implementation would perform well in dynamic environments, with the neural network topology lending itself to possible hardware implementations as explored in [1,2]. The voting methodology adopted by this algorithm significantly mitigates the training overhead traditionally required by neural networks, as each network in this approach is only required to find a local minima, rather than converging to the elusive global result of a perfect system. Errors arising in any one neural network are generally accounted for by correct classifications from the other two networks, resulting in an aggregate system with a high accuracy rate, low initial training requirements, and a self reinforcing adaptive learning capability. This paper is organised as follows: in Curr...