2018
DOI: 10.1007/978-3-030-01418-6_38
|View full text |Cite
|
Sign up to set email alerts
|

Cosine Normalization: Using Cosine Similarity Instead of Dot Product in Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
74
0
1

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 139 publications
(80 citation statements)
references
References 3 publications
0
74
0
1
Order By: Relevance
“…Otherwise, a smaller weight is assigned. Here, we use the cosine similarity metric [27] to measure the similarity between the warped features and the features extracted from the reference frame. Moreover, we do not directly use the convolutional features obtained from N feat (I).…”
Section: Model Designmentioning
confidence: 99%
“…Otherwise, a smaller weight is assigned. Here, we use the cosine similarity metric [27] to measure the similarity between the warped features and the features extracted from the reference frame. Moreover, we do not directly use the convolutional features obtained from N feat (I).…”
Section: Model Designmentioning
confidence: 99%
“…In our model, as a result of normalizing embeddings and columns of the weight matrix, the magnitude differences do not affect the prediction as long as the angle between the normalized vectors remains the same, since the inner product w i φ(x) ∈ [−1, 1] now measures cosine similarity. Recent work in cosine normalization [9] discusses a similar idea of replacing the inner product with a cosine similarity for bounded activations and stable training, while we arrive at this design from a different direction. In particular, this establishes a symmetric relationship between normalized embeddings and weights, which enables us to treat them interchangeably.…”
Section: Model Architecturementioning
confidence: 99%
“…Traditional multi-layer neural networks use dot product between the output vector of previous layer and the incoming weight vector as the input to activation function. [23,11] recently showed that replacing the dot product with cosine similarity can bound and reduce the variance of the neurons and thus result in models of better generalization. Considering that we are trying to calculate the correlation between data from two dramatically different domains, especially for the attribute domain in which the features are discontinuous and have high variances.…”
Section: Zero-shot Learningmentioning
confidence: 99%
“…We speculate the reason is that values of class attribute are not continuous such that there are large variance among the attribute vectors of different classes. Consequently, classifier weights derived from them also possess large variance, which might cause high variances of inputs to the Softmax activation function [23]. Unlike dot product, our cosine similarity based score function normalizes the classi-fier weights before calculating its dot product with visual embeddings.…”
Section: Ablation Studiesmentioning
confidence: 99%