Positive and unlabeled learning in categorical data

Ienco, Dino; Pensa, Ruggero G.

doi:10.1016/j.neucom.2016.01.089

Cited by 30 publications

(24 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Step 3 S-EM [62] Spy EM NB ∆E Roc-SVM [53] Rocchio Iterative SVM [107,106] 1-DNF Iterative SVM Last A-EM [55] Augmented Negatives EM NB ∆F LGN [54] Single Negative BN / PE PUC [108] PE (EM) NB Unspecified WVC/PSOC [77] 1-DNF * Iterative SVM Vote CR-SVM [56] Rocchio * SVM / MCLS [13] k-means Iterative LS-SVM Last C-CRNE [63] C-CRNE TFIPNDF / Pulce [37] DILCA DILCA-KNN / PGPU [31] PGPU biased SVM /…”

Section: Methodsmentioning

confidence: 99%

“…This assumption allows identifying reliable negative examples as those that are far from all the labeled examples. This can be done by using different similarity (or distance) measures such as tf-idf for text [53] or DILCA for categorical attributes [37]. This assumption is important for two-step techniques (Section 5.1).…”

Section: Definition 5 (Smoothness)mentioning

confidence: 99%

“…For each example, the k nearest positives and k nearest reliable negatives are selected and the average distance to those are calculated with the appropriate distance measure. The class is the one for which it has the lowest average distance [37]. TFIPNDF Term Frequency Inverse Positive-Negative Document Frequency is a tf-idf-improved method that weights the terms in documents according to their appearance in positive and negative documents [63].…”

Section: Pnlh the Positive Examples And Negative Examples Labeling He...mentioning

confidence: 99%

See 2 more Smart Citations

Learning from positive and unlabeled data: a survey

2020

View full text Add to dashboard Cite

Learning from positive and unlabeled data or PU learning is the setting where a learner only has access to positive examples and unlabeled data. The assumption is that the unlabeled data can contain both positive and negative examples. This setting has attracted increasing interest within the machine learning literature as this type of data naturally arises in applications such as medical diagnosis and knowledge base completion. This article provides a survey of the current state of the art in PU learning. It proposes seven key research questions that commonly arise in this field and provides a broad overview of how the field has tried to address them.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Definition 5 (Smoothness)mentioning

confidence: 99%

Section: Pnlh the Positive Examples And Negative Examples Labeling He...mentioning

confidence: 99%

See 1 more Smart Citation

Learning from positive and unlabeled data: a survey

2020

View full text Add to dashboard Cite

show abstract

“…A novel algorithm named Pulce [23] was proposed by Dino Ienco and Ruggero G. Pensa for the positive and unlabeled (PU) learning in categorical data. An efficient classifier can be built by using both positive and negative examples, but this requirement is not satisfied in many domains.…”

Section: Related Workmentioning

confidence: 99%

Incremental Learning for Classification of Unstructured Data Using Extreme Learning Machine

Madhusudhanan¹,

Jaganathan²,

Jayashree

2018

Algorithms

View full text Add to dashboard Cite

Unstructured data are irregular information with no predefined data model. Streaming data which constantly arrives over time is unstructured, and classifying these data is a tedious task as they lack class labels and get accumulated over time. As the data keeps growing, it becomes difficult to train and create a model from scratch each time. Incremental learning, a self-adaptive algorithm uses the previously learned model information, then learns and accommodates new information from the newly arrived data providing a new model, which avoids the retraining. The incrementally learned knowledge helps to classify the unstructured data. In this paper, we propose a framework CUIL (Classification of Unstructured data using Incremental Learning) which clusters the metadata, assigns a label for each cluster and then creates a model using Extreme Learning Machine (ELM), a feed-forward neural network, incrementally for each batch of data arrived. The proposed framework trains the batches separately, reducing the memory resources, training time significantly and is tested with metadata created for the standard image datasets like MNIST, STL-10, CIFAR-10, Caltech101, and Caltech256. Based on the tabulated results, our proposed work proves to show greater accuracy and efficiency.

show abstract

“…the values of the same context attributes. DILCA has been successfully used in different scenarios including clustering (Ienco et al 2012), semi-supervised learning (Ienco and Pensa 2016) and anomaly detection (Ienco et al 2017). However, if applied to a secret dataset, it may disclose a lot of private information.…”

mentioning

confidence: 99%

Differentially Private Distance Learning in Categorical Data

Battaglia

Celano

Pensa

2021

Data Min Knowl Disc

Self Cite

View full text Add to dashboard Cite

Most privacy-preserving machine learning methods are designed around continuous or numeric data, but categorical attributes are common in many application scenarios, including clinical and health records, census and survey data. Distance-based methods, in particular, have limited applicability to categorical data, since they do not capture the complexity of the relationships among different values of a categorical attribute. Although distance learning algorithms exist for categorical data, they may disclose private information about individual records if applied to a secret dataset. To address this problem, we introduce a differentially private family of algorithms for learning distances between any pair of values of a categorical attribute according to the way they are co-distributed with the values of other categorical attributes forming the so-called context. We define different variants of our algorithm and we show empirically that our approach consumes little privacy budget while providing accurate distances, making it suitable in distance-based applications, such as clustering and classification.

show abstract

Positive and unlabeled learning in categorical data

Cited by 30 publications

References 29 publications

Learning from positive and unlabeled data: a survey

Learning from positive and unlabeled data: a survey

Incremental Learning for Classification of Unstructured Data Using Extreme Learning Machine

Differentially Private Distance Learning in Categorical Data

Contact Info

Product

Resources

About