Pathways: Asynchronous Distributed Dataflow for ML

Barham, Paul; Chowdhery, Aakanksha; Dean, Jeff; Ghemawat, Sanjay; Hand, Steven C.; Hurt, D.; Isard, Michael; Lim, Hyeontaek; Pang, Ruoming; Roy, Sudip; Saeta, Brennan; Schuh, Parker; Sepassi, Ryan; Shafey, Laurent El; Thekkath, Chandramohan A.; Wu, Yangjie

doi:10.48550/arxiv.2203.12533

Cited by 8 publications

(7 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The development of distributed training techniques has substantially accelerated the pace of training larger models [14,29]. In Nebula-I, the training environment contains two parts, i.e.…”

Section: Parallization Layermentioning

confidence: 99%

Nebula-I: A General Framework for Collaboratively Training Deep Learning Models on Low-Bandwidth Cloud Clusters

Xiang¹,

Wu²,

Gong³

et al. 2022

Preprint

View full text Add to dashboard Cite

The ever-growing model size and scale of compute have attracted increasing interests in training deep learning models over multiple nodes. However, when it comes to training on cloud clusters, especially across remote clusters, huge challenges are faced. In this work, we introduce a general framework, Nebula-I, for collaboratively training deep learning models over remote heterogeneous clusters, the connections between which are low-bandwidth wide area networks (WANs). We took natural language processing (NLP) as an example to show how Nebula-I works in different training phases that include: a) pre-training a multilingual language model using two remote clusters; and b) fine-tuning a machine translation model using knowledge distilled from pre-trained models, which run through the most popular paradigm of recent deep learning. To balance the accuracy and communication efficiency, in Nebula-I, parameter-efficient training strategies, hybrid parallel computing methods and adaptive communication acceleration techniques are jointly applied. Meanwhile, security strategies are employed to guarantee the safety, reliability and privacy in intra-cluster computation and inter-cluster communication. Nebula-I is implemented with the PaddlePaddle deep learning framework, which can support collaborative training over heterogeneous hardware, e.g. GPU and NPU. Experiments demonstrate that the proposed framework could substantially maximize the training efficiency while preserving satisfactory NLP performance. By using Nebula-I, users can run large-scale training tasks over cloud clusters with minimum developments, and the utility of existed large pre-trained models could be further promoted. We also introduced new state-of-the-art results on cross-lingual natural language inference tasks, which are generated based upon a novel learning framework and Nebula-I.

show abstract

“…The development of distributed training techniques has substantially accelerated the pace of training larger models [14,29]. In Nebula-I, the training environment contains two parts, i.e.…”

Section: Parallization Layermentioning

confidence: 99%

Nebula-I: A General Framework for Collaboratively Training Deep Learning Models on Low-Bandwidth Cloud Clusters

Xiang¹,

Wu²,

Gong³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Переживание-воображение-наблюдения-объяснения и ментальная самоорганизация (кодопоезис) обуславливают «экстремальное обобщение», которое до настоящего времени отсутствует у машинного интеллекта (Imagination: power of abstract modeling of hypothetical situations; human cognition is capable of extreme generalization, quickly adapting to radically novel situations [109]; новые системы и исследовательские идеи машинного обучения обсуждаются в работе [74]).…”

Section: к вопросу о концепции «сильного интеллекта»unclassified

Интуиция: Опыт Формального Исследования

Прокопчук¹

2022

Preprint

View full text Add to dashboard Cite

Традиционная рациональность оказывается неспособной ответить на многие вопросы, возникающие вокруг интуиции. В книге предпринята попытка прояснить глубинные механизмы работы интуиции, творчества, бессознательного на основе парадигмы предельных обобщений. Результаты описывают новые подходы к поддержанию оптимальной сложности в субъективном представлении и анализе больших данных.Результаты исследования могут найти применение вобразовании, экономике, медицине, искусственном интеллекте, управлении сложными системами разной природы.

show abstract

“…PaLM [10], aka. Pathways Language Model, is a densely-activated decoder-only transformer language model trained using Pathways [15], a large-scale ML accelerator orchestration system that enables highly efficient training across TPU pods. At the time of release, PaLM 540B achieved breakthrough performance on a suite of multi-step reasoning tasks [10].…”

Section: Modelsmentioning

confidence: 99%

Deconfounding User Satisfaction Estimation from Response Rate Bias

Christakopoulou

Traverse

Potter

et al. 2020

Fourteenth ACM Conference on Recommender Systems

View full text Add to dashboard Cite

Improving user satisfaction is at the forefront of industrial recommender systems. While significant progress has been made by utilizing logged implicit data of user-item interactions (i.e., clicks, dwell/watch time, and other user engagement signals), there has been a recent surge of interest in measuring and modeling user satisfaction, as provided by orthogonal data sources. Such data sources typically originate from responses to user satisfaction surveys, which explicitly ask users to rate their experience with the system and/or specific items they have consumed in the recent past. This data can be valuable for measuring and modeling the degree to which a user has had a satisfactory experience on the recommendation platform, since what users do (engagement) does not always align with what users say they want (satisfaction as measured by surveys). We focus on a large-scale industrial system trained on user survey responses to predict user satisfaction. The predictions of the satisfaction model for each user-item pair, combined with the predictions of the other models (e.g., engagement-focused ones), are fed into the ranking component of a real-world recommender system in deciding items to present to the user. It is therefore imperative that the satisfaction model does an equally good job on imputing user satisfaction across slices of users and items, as it would directly impact which items a user is exposed to. However, the data used for training satisfaction models is biased in that users are more likely to respond to a survey when they will respond that they are more satisfied. When the satisfaction survey responses in slices of data with high response rate follow a different distribution than those with low response rate, response rate becomes a confounding factor for user satisfaction estimation. We find positive correlation between response rate and ratings in a large-scale survey dataset collected in our case study. To address this inherent response rate bias in the satisfaction data, we propose an inverse propensity weighting approach within a multi-task learning framework. We extend a simple feed-forward neural network architecture predicting user satisfaction to a shared-bottom multi-task learning architecture with two tasks: the user satisfaction estimation task, and the response rate estimation task. We Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

show abstract

Pathways: Asynchronous Distributed Dataflow for ML

Cited by 8 publications

References 0 publications

Nebula-I: A General Framework for Collaboratively Training Deep Learning Models on Low-Bandwidth Cloud Clusters

Nebula-I: A General Framework for Collaboratively Training Deep Learning Models on Low-Bandwidth Cloud Clusters

Интуиция: Опыт Формального Исследования

Deconfounding User Satisfaction Estimation from Response Rate Bias

Contact Info

Product

Resources

About