2012
DOI: 10.2478/v10006-012-0048-z
|View full text |Cite
|
Sign up to set email alerts
|

Optimal estimator of hypothesis probability for data mining problems with small samples

Abstract: The paper presents a new (to the best of the authors' knowledge) estimator of probability called the "Ep h √ 2 completeness estimator" along with a theoretical derivation of its optimality. The estimator is especially suitable for a small number of sample items, which is the feature of many real problems characterized by data insufficiency. The control parameter of the estimator is not assumed in an a priori, subjective way, but was determined on the basis of an optimization criterion (the least absolute error… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(9 citation statements)
references
References 23 publications
(12 reference statements)
0
9
0
Order By: Relevance
“…As an error measure of a probability estimation method we use the mean absolute error (abbreviated as MAE in the paper) for easier comparison with the findings reported in Piegat and Landowski (2012). Also, the preliminary experiments with another measure of error, root mean squared error (RMSE), revealed that the general observations and conclusions remain the same regardless of the error measure used.…”
Section: Historical Background and Related Workmentioning
confidence: 98%
See 1 more Smart Citation
“…As an error measure of a probability estimation method we use the mean absolute error (abbreviated as MAE in the paper) for easier comparison with the findings reported in Piegat and Landowski (2012). Also, the preliminary experiments with another measure of error, root mean squared error (RMSE), revealed that the general observations and conclusions remain the same regardless of the error measure used.…”
Section: Historical Background and Related Workmentioning
confidence: 98%
“…Formula (4) is in their paper denoted by Ep ha and has one parameter a. The theoretical optimization of the mean absolute error (MAE) with the proposed formula (4) yielded the optimal value of a = √ 2 (Piegat and Landowski, 2012). After the replacement with the optimized value of a, the following formula, denoted by Ep h √ 2 in their paper, was obtained:…”
Section: Historical Background and Related Workmentioning
confidence: 99%
“…Figure 3 shows the Category distribution based on λ, where the operating costs are classified into "low" and "high" by setting λ ¼ 0:69 based on Tables 2 and 4. The "high" operating cost set is denoted by columns 1, 2, 3, 4, 6,7,8,9,10,11,12,13,14,15,18, and 21 in Table 4. The "low" operating cost is represented by columns 5, 16, 17, 19, and 20 in Table 4.…”
Section: Case Studymentioning
confidence: 99%
“…In other words, the focus of the problem is how to estimate the lifecycle costs using small sample data. Relatively recent publications have provided some in-depth discussions regarding small sample estimation [4][5][6][7][8], where fuzzy clustering and support vector machine (SVM) have received special attentions [9][10][11]. Fuzzy clustering and SVM have been applied to address various problems through progression as the methodologies themselves advance, such as classification, regression, image classification, human activity, geo-marketing analysis, and drug discovery [12][13][14][15][16][17][18][19].…”
Section: Introductionmentioning
confidence: 99%
“…Problems concerning the small sample size and pseudoinverse appear in the most recent works (Piegat and Landowski, 2012;Röbenack and Reinschke, 2011). We propose an extension of the last approach.…”
Section: Introductionmentioning
confidence: 99%