2020 International Conference on Machine Learning and Cybernetics (ICMLC) 2020
DOI: 10.1109/icmlc51923.2020.9469046
|View full text |Cite
|
Sign up to set email alerts
|

Transfer Learning with Shapeshift Adapter: A Parameter-Efficient Module for Deep Learning Model

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(8 citation statements)
references
References 6 publications
0
8
0
Order By: Relevance
“…R-Transformer [12] directly alleviates the speed limitation caused by inability to parallelize by using local RNN. FLOATER [3] abandons the traditional RNN model and uses neural differential equations to recursively calculate the position information of text. Due to the simpler calculation method compared to RNN, it can alleviate the speed problem caused by inability to parallelize.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…R-Transformer [12] directly alleviates the speed limitation caused by inability to parallelize by using local RNN. FLOATER [3] abandons the traditional RNN model and uses neural differential equations to recursively calculate the position information of text. Due to the simpler calculation method compared to RNN, it can alleviate the speed problem caused by inability to parallelize.…”
Section: Related Workmentioning
confidence: 99%
“…It can be seen from formula (1) that the RNNs actually calculates the weights of data at different time steps, and performs weighted summation according to their weights after feature processing. Research shows that for transformers, the feature extraction of text is mainly completed by the self-attention mechanism [3], while the main role of RNNs is to model the temporal information of the text. In other words, simplifying the parameters used for feature extraction in RNNs has little impact on the accuracy of feature extraction, but greatly improve the calculation speed of the models.…”
Section: Cumsum Calculationmentioning
confidence: 99%
See 2 more Smart Citations
“…For example, due to the nature of adaptive step size ODE solvers, the situation that many consecutive layers are dynamically equivalent is very common, in [41], the problem is solved by applying optimal transportation theory to encourage simpler trajectory dynamics. Recent developments extend these ideas to continuous-time video forecasting [13] and continuous attention architectures [42], [43]. However, the application to anytime human 3D pose forecasting has not been explored.…”
Section: Related Workmentioning
confidence: 99%