A Survey on Green Deep Learning

Xu, Jun; Zhou, Wangchunshu; Fu, Zhiyi; Zhang, Hao; Li, Lei

doi:10.48550/arxiv.2111.05193

Cited by 18 publications

(21 citation statements)

References 126 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Green AI. Witnessing the exponential growth of computations of big AI models [13,5,42], the concept of Green AI attains mounting attention in recent years [55,67]. Rather than being merely obsessed with accuracy, Green AI advocates making efficiency an important measure of AI models, championing the greener approaches that are more inclusive to the research community.…”

Section: Related Workmentioning

confidence: 99%

“…To this end, we strive to devise a new and green approach for MIM with hierarchical models, in the spirit of Green AI [55,67]. Our work focuses on extending the asymmetric encoder-decoder architecture of MAE to hierarchical vision transformers, particularly the representative model Swin Transformer [43], for the sake of efficient pre-training on visible patches only.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Green Hierarchical Vision Transformer for Masked Image Modeling

Huang¹,

You²,

Zheng³

et al. 2022

Preprint

View full text Add to dashboard Cite

We present an efficient approach for Masked Image Modeling (MIM) with hierarchical Vision Transformers (ViTs), e.g., Swin Transformer [43], allowing the hierarchical ViTs to discard masked patches and operate only on the visible ones. Our approach consists of two key components. First, for the window attention, we design a Group Window Attention scheme following the Divide-and-Conquer strategy. To mitigate the quadratic complexity of the self-attention w.r.t. the number of patches, group attention encourages a uniform partition that visible patches within each local window of arbitrary size can be grouped with equal size, where masked self-attention is then performed within each group. Second, we further improve the grouping strategy via the Dynamic Programming algorithm to minimize the overall computation cost of the attention on the grouped patches. As a result, MIM now can work on hierarchical ViTs in a green and efficient way. For example, we can train the hierarchical ViTs about 2.7× faster and reduce the GPU memory usage by 70%, while still enjoying competitive performance on ImageNet classification and the superiority on downstream COCO object detection benchmarks. † * Corresponding author. † Code and pre-trained models: https://github.com/LayneH/GreenMIM.Preprint. Under review.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Green Hierarchical Vision Transformer for Masked Image Modeling

Huang¹,

You²,

Zheng³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Consequently, only using model size to assess model efficiency may be inadequate; 3) Actual Inference Time, which is the most intuitive metric for efficiency evaluation. However, since the actual inference time is heavily related to both hardware environment and software implementation, and some algorithms may be hardware-specialized, it is challenging to make a fair comparison between models run on different infrastructures [1233]. In these cases, it is critical to propose new metrics which could comprehensively and faithfully assess model efficiency.…”

Section: Inadequate Metricmentioning

confidence: 99%

A Roadmap for Big Model

Yuan¹,

Zhao²,

Jiahong³

et al. 2022

Preprint

View full text Add to dashboard Cite

domains indexed by Google News. It contains 31 million documents with an average length of 793 BPE tokens. Like C4, it excludes examples with duplicate URLs. News dumps from December 2016 through March 2019 were used as training data, articles published in April 2019 from the April 2019 dump were used for evaluation. OpenWebText2(OWT2). OWT2 is an enhanced version of the original OpenWebTextCorpus, including content from multiple languages, document metadata, multiple dataset versions, and open source replication code, covering all Reddit submissions from 2005 up until April 2020. PubMed Central(PMC). PMC is a free full-text archive of biomedical and life sciences journal literature from the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM). The dataset is updated daily. In addition to full-text articles, they contain corrections, retractions, and expressions of concern, as well as file lists that include metadata for articles in each dataset.PMC obtained by open registration in Amazon Web Services (AWS) includes The PMC Open Access Subset and The Author Manuscript Dataset. The PMC Open Access Subset includes all articles and preprints in PMC with a machine-readable Creative Commons license that allows reuse. The Author Manuscript Dataset includes accepted author manuscripts collected under a funder policy in PMC and made available in machine-readable formats for text mining. ArXiv. ArXiv is a repository of 1.7 million articles, with relevant features such as article titles, authors, categories, abstracts, full text PDFs, and more. It provides open access to academic articles, covering many subdisciplines from vast branches of physics to computer science to everything in between, including math, statistics, electrical engineering, quantitative biology, and economics, which is helpful to the potential downstream applications of the research field. In addition, the writing language of LaTeX also contributes to the study of language models. Colossal Clean Crawled Corpus(C4). C4 is a colossal, cleaned version of Common Crawl's web crawl corpus. It is based on Common Crawl dataset and was used to train the T5 text-to-text Transformer models. The cleaned English version of C4 has 364,868,901 training examples and 364,608 validation examples, while the uncleaned English version has 1,063,805,324 training examples and 1,065,029 validation examples; the realnewslike version has 13,799,838 training examples and 13,863 validation examples, while the webtextlike version has 4,500,788 training examples and 4,493 validation examples. Wiki-40B. Wikipedia (Wiki-40B) is a clean-up text collection containing more than 40 Wikipedia language editions of pages corresponding to entities. The dataset is split into train/validation/test sets for each language. The training set has 2,926,536 examples, the validation set has 163,597 examples, and the test set has 162,274 examples. Wiki-40B is cleaned by a page filter to remove ambiguous, redirected, deleted, and non-physical pages. CLUECorpus2020. CLUECorpus2020 ...

show abstract

“…Using larger models will also likely leads to some improvements. However, we do not run experiments with huge pre-training data and giant models due to environmental considerations [61,62] and we try to make our experiments as "green" as possible.…”

Section: A1 Limitations and Potential Socail Impactsmentioning

confidence: 99%

Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training

Zeng¹,

Zhou²,

Luo³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

In this paper, we introduce Cross-View Language Modeling, a simple and effective language model pre-training framework that unifies cross-lingual cross-modal pretraining with shared architectures and objectives. Our approach is motivated by a key observation that cross-lingual and cross-modal pre-training share the same goal of aligning two different views of the same object into a common semantic space. To this end, the cross-view language modeling framework considers both multi-modal data (i.e., image-caption pairs) and multi-lingual data (i.e., parallel sentence pairs) as two different views of the same object, and trains the model to align the two views by maximizing the mutual information between them with conditional masked language modeling and contrastive learning. We pre-train CCLM 3 , a Cross-lingual Cross-modal Language Model, with the cross-view language modeling framework. Empirical results on IGLUE, a multi-lingual multimodal benchmark, and two multi-lingual image-text retrieval datasets show that while conceptually simpler, CCLM significantly outperforms the prior state-of-theart with an average absolute improvement of over 10%. Notably, CCLM is the first multi-lingual multi-modal model that surpasses the translate-test performance of representative English vision-language models by zero-shot cross-lingual transfer.Preprint. Under review.

show abstract

A Survey on Green Deep Learning

Cited by 18 publications

References 126 publications

Green Hierarchical Vision Transformer for Masked Image Modeling

Green Hierarchical Vision Transformer for Masked Image Modeling

A Roadmap for Big Model

Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training

Contact Info

Product

Resources

About