2018
DOI: 10.1109/tbdata.2017.2688360
|View full text |Cite
|
Sign up to set email alerts
|

Online Similarity Learning for Big Data with Overfitting

Abstract: Abstract-In this paper, we propose a general model to address the overfitting problem in online similarity learning for big data, which is generally generated by two kinds of redundancies: 1) feature redundancy, that is there exists redundant (irrelevant) features in the training data; 2) rank redundancy, that is non-redundant (or relevant) features lie in a low rank space. To overcome these, our model is designed to obtain a simple and robust metric matrix through detecting the redundant rows and columns in t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 23 publications
(6 citation statements)
references
References 30 publications
0
5
0
Order By: Relevance
“…We will investigate the effect of additional data context types on the wrangling pipeline, and on other wrangling stages such as Web data extraction [49]. To further address time-varying variety and veracity problems in data wrangling, we will investigate feedback-based learning and model refinement techniques such as presented in [42] or [50]. Furthermore, we are exploring how to combine evidence gained from data context with user preferences, as shown in [23], to elaborate the possibilities in tailoring a data product for users with different requirements.…”
Section: Discussionmentioning
confidence: 99%
“…We will investigate the effect of additional data context types on the wrangling pipeline, and on other wrangling stages such as Web data extraction [49]. To further address time-varying variety and veracity problems in data wrangling, we will investigate feedback-based learning and model refinement techniques such as presented in [42] or [50]. Furthermore, we are exploring how to combine evidence gained from data context with user preferences, as shown in [23], to elaborate the possibilities in tailoring a data product for users with different requirements.…”
Section: Discussionmentioning
confidence: 99%
“…Feature selection is a process of deleting unrelated or redundant and preserving important features so that features that remain can describe more accurate models. Redundant and unrelated features are noise in machine learning models because they will waste the computational performance of the model and make the model overfit to lose accuracy [11]. There are three feature selection methods: filter, package, and embedded selection.…”
Section: Feature Selectionmentioning
confidence: 99%
“…al. [1] describes a model for big data to control the overfitting problem which where comes under online similarity learning. The model provides simple and robust metric matrix for finding redundant rows and columns in the metric matrix.…”
Section: Literature Surveymentioning
confidence: 99%
“…In the recent research environment, big data [1] is playing a major role to maintain high volumes of data. Many sectors are implemented a big data and analytics like Agricultural, Banking and Online Marketing, etc.…”
Section: Introductionmentioning
confidence: 99%