2021
DOI: 10.3390/e23030305
|View full text |Cite
|
Sign up to set email alerts
|

Solvable Model for the Linear Separability of Structured Data

Marco Gherardi

Abstract: Linear separability, a core concept in supervised machine learning, refers to whether the labels of a data set can be captured by the simplest possible machine: a linear classifier. In order to quantify linear separability beyond this single bit of information, one needs models of data structure parameterized by interpretable quantities, and tractable analytically. Here, I address one class of models with these properties, and show how a combinatorial method allows for the computation, in a mean field approxim… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

1
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 40 publications
1
3
0
Order By: Relevance
“…Intuitively, separation of the two classes by the last layer is facilitated whenever M + (T ) and M − (T ), at the final epoch T , are small or far apart. This intuition is confirmed by analytical results obtained for the perceptron [13,29]. Our analysis is based on a simple descriptor of manifold extension, the gyration radius, a metric proxy of the set's extension in Euclidean space.…”
Section: Introductionsupporting
confidence: 69%
“…Intuitively, separation of the two classes by the last layer is facilitated whenever M + (T ) and M − (T ), at the final epoch T , are small or far apart. This intuition is confirmed by analytical results obtained for the perceptron [13,29]. Our analysis is based on a simple descriptor of manifold extension, the gyration radius, a metric proxy of the set's extension in Euclidean space.…”
Section: Introductionsupporting
confidence: 69%
“…The goal of theorems in SLT is to provide distribution-independent uniform bounds on the deviation between the generalisation and training errors. The formulation and the derivation of these theorems reveal a source of possible reasons for their poor quantitative performance: (i) empirically relevant data distributions may lead to smaller typical deviations than the worst possible case [27][28][29][30][31]; (ii) uniform bounds hold for all possible functions in the model, but better bounds may hold when one restricts the analysis to functions that perform well on specific (and significative) training sets.…”
Section: Introductionmentioning
confidence: 99%
“…In other words, an admissible classification is such that it assigns the same label to sufficiently similar, or correlated, inputs. In [11][12][13], it is conjectured that a measure of the number of the admissible classifications alone within a certain hypothesis class should be a better bound for the generalization error than the classic data-independent measures of complexity. Indeed, while the number of all the realizable classifications usually grows monotonically with the number of inputs to classify (more points, more ways to label them), the admissibility constraint adds a competing effect of excluded volume (more points, less space available for a separating surface enforcing an admissible classification), which is eventually dominant as the number of inputs grows.…”
Section: Introductionmentioning
confidence: 99%