Evidence-based uncertainty sampling for active learning

Sharma, Mayank; Bilgic, Mustafa

doi:10.1007/s10618-016-0460-3

Cited by 76 publications

(51 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The decision tree is a flowchart-like tree structure, where each non-leaf node represents a test on an attribute, each branch represents an outcome of the test, and leaf nodes represent target classes or class distributions [3]. We decided to use this method because we were able to visualize the tree or to extract the decision rules.…”

Section: B Methodsmentioning

confidence: 99%

Predictive and Descriptive Analysis for Heart Disease Diagnosis

Babič

Olejár

Vantova

et al. 2017

Annals of Computer Science and Information Systems

View full text Add to dashboard Cite

Abstract-The heart disease describes a range of conditions affecting our heart. It can include blood vessel diseases such as coronary artery disease, heart rhythm problems or and heart defects. This term is often used for cardiovascular disease, i.e. narrowed or blocked blood vessels leading to a heart attack, chest pain or stroke. In our work, we analysed three available data sets: Heart Disease Database, South African Heart Disease and Z-Alizadeh Sani Dataset. For this purpose, we focused on two directions: a predictive analysis based on Decision Trees, Naive Bayes, Support Vector Machine and Neural Networks; descriptive analysis based on association and decision rules. Our results are plausible, in some cases comparable or better as in other related works

show abstract

Section: B Methodsmentioning

confidence: 99%

Predictive and Descriptive Analysis for Heart Disease Diagnosis

Babič

Olejár

Vantova

et al. 2017

Annals of Computer Science and Information Systems

View full text Add to dashboard Cite

show abstract

“…In 2016, Sharma et al further divided the traditional uncertainty into two categories according to the reason of instance uncertainty, namely conflicting-evidence uncertainty (UNC-CE) and insufficient-evidence uncertainty (UCN-IE) [7]. The experiments also showed that the conflicting uncertain instance has a more obvious effect on the improvement of classification performance.…”

Section: Active Learning Strategymentioning

confidence: 99%

“…The evidence of an uncertain instance is estimated and used in a batch scenario or pool setting in [7]. However, in the data stream setting, it is impossible to store all instances in the whole data stream and calculate the evidence of all unlabeled instances for ranking, and then pick some unlabeled instances that have higher evidence and lower confidence than the other instances.…”

Section: Algorithm 1 Esplitmentioning

confidence: 99%

“…However, in the data stream setting, it is impossible to store all instances in the whole data stream and calculate the evidence of all unlabeled instances for ranking, and then pick some unlabeled instances that have higher evidence and lower confidence than the other instances. In addition, as uncertainty threshold changes over time in data stream, we will not directly take advantage of the calculation result of [7] and will process it according to the following formula:…”

Section: Algorithm 1 Esplitmentioning

confidence: 99%

“…In the pool-based environment, [7] suggests that we should consider two type of uncertainties, namely, conflicting-evidence uncertainty and insufficient-evidence uncertainty. Unfortunately, in the active learning on data streams, there is no works that employ the evidence to choose more representative instances.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Novel Sampling Strategy for Active Learning over Evolving Stream Data

Zhang¹,

Cao²,

Li³

et al. 2017

Proceedings of the 2nd International Conference on Computer Engineering, Information Science &Amp; Application Technology (ICCI

View full text Add to dashboard Cite

Abstract. In classification tasks, data labeling is an expensive and time-consuming process, hence, active learning which query labels for a small representative portion of data, is becoming increasingly important. However, few works consider the challenges from data steam setting because most of the active learning method is designed for non-streaming setting. Be based upon the status quo, after synthesizing the evidence-based uncertainty sampling strategy and split sampling strategy above, we propose a new sampling strategy for active learning over evolving stream data, which can take full advantages of the strengths of each. First, the original data stream is randomly divided into two substreams. Instances from one sub-stream are labeled according to the high evidence-focused uncertainty strategy, while instances from the other sub-stream are marked by the random strategy for detecting true concept drifts. Second, we introduce a sliding window in the high evidence-focused uncertainty strategy, finding out whether an instance is the conflict-uncertainty instance or not. Clearly, our strategy solves the issue of the effective use of evidence in data streams setting, and can choose more representative instances over evolving data streams for training a model. Finally, in experiments over four benchmark datasets, compared with state-of-art active learning strategies, the result illustrates good predictive performance of our proposed approach. IntroductionNowadays, more and more data are being generated continuously by networks, such as sensor networks, social networks, web applications and financial activities etc. Unlike traditional datasets, data items within a data stream are temporally ordered, fast-changing, generally large-scale, and potentially infinite [1].For learning predictive models on data stream, it is possible to access the true labels of the instances continuously. Unfortunately, inherent labeled instances in data streams are very scarce in practice. Conversely, a very limited number of labeled instances can be collected, and they can hardly provide enough information to train models with good generalization capabilities [2]. However, the manual labeling is expensive, especially in terms of time. Besides, in pace with time, the relationship between attributes and labels might change, such as spam identification and vaccine production. In order to know the true label, it is needed to scan the mail or make a laboratory test, which is timeconsuming. Hence, querying labels for a small representative subset of all stream data, has become an effective solution. Such a learning situation goes by the name of active learning. In pool-based and online environments [3,4], active learning has received widely attention and research.In the data stream setting, active learning is further divided into online active learning and active learning in data streams. The main difference between the two branches is whether the concept drifts exist or not. Online active learning has a generally accepted assumption that th...

show abstract

Recent advances in scaling‐down sampling methods in machine learning

ElRafey

Wojtusiak

2017

WIREs Computational Stats

View full text Add to dashboard Cite

Data sampling methods have been investigated for decades in the context of machine learning and statistical algorithms, with significant progress made in the past few years driven by strong interest in big data and distributed computing. Most recently, progress has been made in methods that can be broadly categorized into random sampling including density-biased and nonuniform sampling methods; active learning methods, which are a type of semi-supervised learning and an area of intense research; and progressive sampling methods which can be viewed as a combination of the above two approaches. A unified view of scalingdown sampling methods is presented in this article and complemented with descriptions of relevant published literature.

show abstract

Evidence-based uncertainty sampling for active learning

Cited by 76 publications

References 18 publications

Predictive and Descriptive Analysis for Heart Disease Diagnosis

Predictive and Descriptive Analysis for Heart Disease Diagnosis

A Novel Sampling Strategy for Active Learning over Evolving Stream Data

Recent advances in scaling‐down sampling methods in machine learning

Contact Info

Product

Resources

About