Transfer Learning with Shapeshift Adapter: A Parameter-Efficient Module for Deep Learning Model

Liu, Jingyuan; Rajati, Mohammad Reza

doi:10.1109/icmlc51923.2020.9469046

Cited by 5 publications

(8 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…R-Transformer [12] directly alleviates the speed limitation caused by inability to parallelize by using local RNN. FLOATER [3] abandons the traditional RNN model and uses neural differential equations to recursively calculate the position information of text. Due to the simpler calculation method compared to RNN, it can alleviate the speed problem caused by inability to parallelize.…”

Section: Related Workmentioning

confidence: 99%

“…It can be seen from formula (1) that the RNNs actually calculates the weights of data at different time steps, and performs weighted summation according to their weights after feature processing. Research shows that for transformers, the feature extraction of text is mainly completed by the self-attention mechanism [3], while the main role of RNNs is to model the temporal information of the text. In other words, simplifying the parameters used for feature extraction in RNNs has little impact on the accuracy of feature extraction, but greatly improve the calculation speed of the models.…”

Section: Cumsum Calculationmentioning

confidence: 99%

“…Relative positional encoding models temporal information from the perspective of lexical relative distance, and supports model parallelism operations while having strong temporal modeling capabilities, but it abandons the recursive structure so loses the ability to capture text linear structure information. Although, Transformer models' recursive positional encoding of RNN has strong temporal modeling capabilities, but suffers from vanishing gradients [3]. To alleviate the problem of vanishing gradients and improve the temporal modeling ability of the Transformer model, the FLOATER is proposed which retains RNN's temporal modeling ability but removes its feature extraction module, while achieving superior performance.…”

Section: Introductionmentioning

confidence: 99%

“…However, the FLOATER model has limitations such as slow training speed and difficult convergence. [3].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Sumformer: recursive positional encoding for transformer in short text classification

Zhan

Huang

et al. 2023

3rd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2023)

View full text Add to dashboard Cite

In various transformers, positional encoding is used to compensate for the inability of the attention mechanism to capture positional information between words. Previous research on transformers' temporal modeling has utilized recursive and relative positional encoding based on the Recurrent Neural Network (RNN). Recursive positional encoding captures linear text structure but lacks parallelization, hindering speed. In contrast, relative positional encoding ignores linear text structure, leading to weaker performance in short text classification compared to recursive positional encoding. To address the issues, we propose a model, sumformer, which mainly includes two parts different from the other transformers: cumsum calculation and summer initialization. Cumsum calculation simplifies the feature extraction part of RNN by a substitution method, replacing the dynamic rate function of RNNs with static trainable position parameters, and preserves the recursive structure, which enables the model to capture the linear structure information of the text through cumsum calculation method and maintains a low time overhead compared to RNNs. In addition, the summer initialization method, which limits the highest standard deviation of the positional parameter, enables the model to pay attention to the multi-level information of the text during initialization, with the richer optimization space, thereby improving the convergence ability of the model. The experimental results show the sumformer achieves roughly a 3% improvement in performance and a 58% improvement in speed compared to existing transformers based on recursive positional encoding. It achieves better short text classification faster, and summer initialization also can improve the performance without increasing training and inference time.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Cumsum Calculationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

“…However, the FLOATER model has limitations such as slow training speed and difficult convergence. [3].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Sumformer: recursive positional encoding for transformer in short text classification

Zhan

Huang

et al. 2023

3rd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2023)

View full text Add to dashboard Cite

show abstract

“…For example, due to the nature of adaptive step size ODE solvers, the situation that many consecutive layers are dynamically equivalent is very common, in [41], the problem is solved by applying optimal transportation theory to encourage simpler trajectory dynamics. Recent developments extend these ideas to continuous-time video forecasting [13] and continuous attention architectures [42], [43]. However, the application to anytime human 3D pose forecasting has not been explored.…”

Section: Related Workmentioning

confidence: 99%

Social ODE: Multi-agent Trajectory Forecasting with Neural Ordinary Differential Equations

Song

Wang

Metaxas

2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Anytime 3D human pose forecasting is crucial to synchronous real-world human-machine interaction, where the term "anytime" corresponds to predicting human pose at any real-valued time step. However, to the best of our knowledge, all the existing methods in human pose forecasting perform predictions at preset, discrete time intervals. Therefore, we introduce AnyPose, a lightweight continuous-time neural architecture that models human behavior dynamics with neural ordinary differential equations. We validate our framework on the Human3.6M, AMASS, and 3DPW dataset and conduct a series of comprehensive analyses towards comparison with existing methods and the intersection of human pose and neural ordinary differential equations. Our results demonstrate that AnyPose exhibits high-performance accuracy in predicting future poses and takes significantly lower computational time than traditional methods in solving anytime prediction tasks.

show abstract

CRE: Circle relationship embedding of patches in vision transformer

Yu,

Triesch

2023

ESANN 2023 Proceesdings

View full text Add to dashboard Cite

The vision transformer (ViT) utilizes a learnable position embedding (PE) to encode the location of an image patch. However, it is unclear if this learnable PE is vital and what its benefits are. This paper explores an alternative way of encoding patch locations that exploits prior knowledge about their spatial arrangement called circle relationship embedding (CRE). CRE considers the distance of image patches from the central patch based on the four-neighborhood to simplify the PE. Our experiments show that combining CRE with PE achieves better performance than using PE alone. The code for this paper can be downloaded at: https://github.com/trieschlab/CRE.

show abstract

Transfer Learning with Shapeshift Adapter: A Parameter-Efficient Module for Deep Learning Model

Cited by 5 publications

References 6 publications

Sumformer: recursive positional encoding for transformer in short text classification

Sumformer: recursive positional encoding for transformer in short text classification

Social ODE: Multi-agent Trajectory Forecasting with Neural Ordinary Differential Equations

CRE: Circle relationship embedding of patches in vision transformer

Contact Info

Product

Resources

About