2022
DOI: 10.48550/arxiv.2204.00595
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Monarch: Expressive Structured Matrices for Efficient and Accurate Training

Abstract: Large neural networks excel in many domains, but they are expensive to train and fine-tune. A popular approach to reduce their compute/memory requirements is to replace dense weight matrices with structured ones (e.g., sparse, low-rank, Fourier transform). These methods have not seen widespread adoption (1) in end-to-end training due to unfavorable efficiency-quality tradeoffs, and (2) in denseto-sparse fine-tuning due to lack of tractable algorithms to approximate a given dense weight matrix. To address these… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
0
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 72 publications
(105 reference statements)
0
0
0
Order By: Relevance
“…These supports are interesting because they are those taken at the first two steps of the hierarchical algorithm in [25,44] for approximating a matrix by a product of N butterfly factors [25]. The first pair of support constraints (I 1 , J 1 ) is also equivalent to the recently proposed Monarch parameterization [9]. Both pairs (I 1 , J 1 ) and (I 2 , J 2 ) are proved to satisfy Theorem 3.…”
Section: Absence Of Correlation Between Tractability and Benign Lands...mentioning
confidence: 99%
See 1 more Smart Citation
“…These supports are interesting because they are those taken at the first two steps of the hierarchical algorithm in [25,44] for approximating a matrix by a product of N butterfly factors [25]. The first pair of support constraints (I 1 , J 1 ) is also equivalent to the recently proposed Monarch parameterization [9]. Both pairs (I 1 , J 1 ) and (I 2 , J 2 ) are proved to satisfy Theorem 3.…”
Section: Absence Of Correlation Between Tractability and Benign Lands...mentioning
confidence: 99%
“…While revising this manuscript we heard about the work of Dao et al[9] introducing the "Monarch" class of structured matrices, essentially corresponding to the first stage of the recursion from[25,44].…”
mentioning
confidence: 99%