Taoxin Peng scite author profile

Classifier ensembles constitute one of the main research directions in machine learning and data mining. The use of multiple classifiers generally allows better predictive performance than that achievable with a single model. Several approaches exist in the literature that provide means to construct and aggregate such ensembles. However, these ensemble systems contain redundant members that, if removed, may further increase group diversity and produce better results. Smaller ensembles also relax the memory and storage requirements, reducing system's run-time overhead while improving overall efficiency. This paper extends the ideas developed for feature selection problems to support classifier ensemble reduction, by transforming ensemble predictions into training samples, and treating classifiers as features. Also, the global heuristic harmony search is used to select a reduced subset of such artificial features, while attempting to maximize the feature subset evaluation. The resulting technique is systematically evaluated using high dimensional and large sized benchmark datasets, showing a superior classification performance against both original, unreduced ensembles, and randomly formed subsets.

show abstract

A Rule Based Taxonomy of Dirty Data

Lin¹,

Peng²,

Kennedy³

2010

View full text Add to dashboard Cite

There is a growing awareness that high quality of data is a key to today's business success and that dirty data existing within data sources is one of the causes of poor data quality. To ensure high quality data, enterprises need to have a process, methodologies and resources to monitor, analyze and maintain the quality of data. Nevertheless, research shows that many enterprises do not pay adequate attention to the existence of dirty data and have not applied useful methodologies to ensure high quality data for their applications. One of the reasons is a lack of appreciation of the types and extent of dirty data. In practice, detecting and cleaning all the dirty data that exists in all data sources is quite expensive and unrealistic. The cost of cleaning dirty data needs to be considered for most of enterprises. This problem has not attracted enough attention from researchers. In this paper, a rule-based taxonomy of dirty data is developed. The proposed taxonomy not only provides a mechanism to deal with this problem but also includes more dirty data types than any of existing such taxonomies.

show abstract

Semantic-Aware Real-Time Correlation Tracking Framework for UAV Videos

Xue

Yin

et al. 2022

IEEE Trans. Cybern.

View full text Add to dashboard Cite

Discriminative correlation filter (DCF) has contributed tremendously to address the problem of object tracking benefitting from its high computational efficiency. However, it has suffered from performance degradation in unmanned aerial vehicle (UAV) tracking. This article presents a novel semanticaware real-time correlation tracking framework (SARCT) for UAV videos to enhance the performance of DCF trackers without incurring excessive computing cost. Specifically, SARCT first constructs an additional detection module to generate ROI proposals and to filter any response regarding the target irrelevant area. Then, a novel semantic segmentation module based on semantic template generation and semantic coefficient prediction is further introduced to capture semantic information, which can provide precise ROI mask, thereby effectively suppressing background interference in the ROI proposals. By sharing features and specific network layers for object detection and semantic segmentation, SARCT reduces parameter redundancy to attain sufficient speed for real-time applications. Systematic experiments are conducted on three typical aerial datasets in order to evaluate the performance of the proposed SARCT. The results demonstrate that SARCT is able to improve the accuracy of conventional DCF-based trackers significantly, outperforming state-of-the-art deep trackers.

show abstract

A Comparison of Techniques for Name Matching

Peng¹,

Li²,

Kennedy³

2012

View full text Add to dashboard Cite

Information explosion is a problem for everyone nowadays. It is a great challenge to all kinds of businesses to maintain high quality of data in their information applications, such as data integration, text and web mining, information retrieval, search engine, etc. In such applications, matching names is one of the popular tasks. There are a number of name matching techniques available. Unfortunately, there is no existing name matching technique that performs the best in all situations. Therefore, a problem that every researcher or a practitioner has to face is how to select an appropriate technique for a given dataset. This paper analyses and evaluates a set of popular name matching techniques on several carefully designed different datasets. The experimental comparison confirms the statement that there is no clear best technique. Some suggestions have been presented, which can be used as guidance for researchers and practitioners to select an appropriate name matching technique in a given dataset.

show abstract

The VoIP intrusion detection through a LVQ-based neural network

Lü

Peng

2009

View full text Add to dashboard Cite

Being a fast-growing Internet application, Voice over Internet Protocol shares the network resources with the regular Internet traffic. However it is susceptible to the existing security holes of the Internet. In this paper, a highly effective VoIP intrusion detection approach based on LVQ neural network is proposed. This detection approach is particularly suitable for protecting VoIP applications, in which various protocols are involved to provide IP telephony services. Experiments of the proposed approach show promising detection accuracy and a low runtime impact on the perceived quality of voice streams.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Taoxin Peng

Feature Selection Inspired Classifier Ensemble Reduction

A Rule Based Taxonomy of Dirty Data

Semantic-Aware Real-Time Correlation Tracking Framework for UAV Videos

A Comparison of Techniques for Name Matching

The VoIP intrusion detection through a LVQ-based neural network

Contact Info

Product

Resources

About