Technical Report: Auxiliary Tuning and its Application to Conditional Text Generation

Zeldes, Yoel; Padnos, Dan; Sharir, Or; Peleg, Barak

doi:10.48550/arxiv.2006.16823

Cited by 2 publications

(6 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Side tuning [32] adds a side model that learns a residual on top of the original model. Similarly, [31] supplements the pre-trained TLM with an external model that shifts the output distribution. An important difference with our work is that we consider that the attribute model already exists in the TLM rather than using external models as in…”

Section: Related Workmentioning

confidence: 99%

“…Large and powerful language models [5] based on the Transformer architecture [28] (TLMs) achieve impressive performance [23,6]. However, such powerful models present several disadvantages: (i) they are difficult to train due to both the size of models and datasets, and the compute resources required 1 ; (ii) TLMs inherit and perpetuate biases that can have a negative social impact [1] and (iii) conditioning these models on concepts requires re-training [14] or using additional parameters [9,32,31], and being limited to very specific concepts.…”

Section: Introductionmentioning

confidence: 99%

“…In [9] the PoE formulation is applied to TLMs, where an external attribute model is used to steer the internal latent variables of the TLM to maximize the presence of a concept. Although not optimizing PoE explicitly, [32,31] also propose using an external model acting as conditional expert to achieve controllability. A notable difference with [17] and [9,32,31] is that we consider that the conditional expert in PoE already exists in the pre-trained model.…”

Section: Introductionmentioning

confidence: 99%

“…Although not optimizing PoE explicitly, [32,31] also propose using an external model acting as conditional expert to achieve controllability. A notable difference with [17] and [9,32,31] is that we consider that the conditional expert in PoE already exists in the pre-trained model. We propose to find the top experts as explained in Sec.…”

Section: Introductionmentioning

confidence: 99%

“…4. Our method neither requires computing gradients and updating the TLMs internal variables at each decoding step as in [9], nor training additional parameters as in [32,31].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Self-conditioning pre-trained language models

Suau¹,

Zappella²,

Apostoloff³

2021

Preprint

View full text Add to dashboard Cite

We study the presence of expert units in pre-trained Transformer-based Language Models (TLMs), and how they can be used to condition text generation to contain specific concepts. We define expert units to be neurons that are able to detect a concept in the input with a given average precision. A concept is represented with a set of sentences that either do or do not contain the concept. Leveraging the OneSec dataset [25], we compile a dataset of 1344 concepts that allows diverse expert units in TLMs to be discovered. Our experiments demonstrate that off-the-shelf pretrained TLMs can be conditioned on their own knowledge (self-conditioning) to generate text that contains a given concept. To this end, we intervene on the top expert units by fixing their output during inference, and we show experimentally that this is an effective method to condition TLMs. Our method does not require fine-tuning the model or using additional parameters, which allows conditioning large TLM with minimal compute resources. Furthermore, by intervening on a small number of experts in GPT2, we can achieve parity with respect to two concepts at generation time. The specific case of gender bias is explored, and we show that, for given contexts, gender parity is achieved while maintaining the model's perplexity.1 For example, [18] was trained on 512 GPUs Preprint. Under review.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

“…4. Our method neither requires computing gradients and updating the TLMs internal variables at each decoding step as in [9], nor training additional parameters as in [32,31].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Self-conditioning pre-trained language models

Suau¹,

Zappella²,

Apostoloff³

2021

Preprint

View full text Add to dashboard Cite

show abstract

Filter Distillation for Network Compression

Suau

Zappella

Apostoloff

2020

2020 IEEE Winter Conference on Applications of Computer Vision (WACV)

View full text Add to dashboard Cite

Multiview Self-Supervised Learning (MSSL) is based on learning invariances with respect to a set of input transformations. However, invariance partially or totally removes transformation-related information from the representations, which might harm performance for specific downstream tasks that require such information. We propose 2D strUctured and approximately EquivarianT representations (coined DUET), which are 2d representations organized in a matrix structure, and equivariant with respect to transformations acting on the input data. DUET representations maintain information about an input transformation, while remaining semantically expressive. Compared to SimCLR (Chen et al., 2020) (unstructured and invariant) and ESSL (Dangovski et al., 2022) (unstructured and equivariant), the structured and equivariant nature of DUET representations enables controlled generation with lower reconstruction error, while controllability is not possible with SimCLR or ESSL. DUET also achieves higher accuracy for several discriminative tasks, and improves transfer learning.

show abstract

Technical Report: Auxiliary Tuning and its Application to Conditional Text Generation

Cited by 2 publications

References 9 publications

Self-conditioning pre-trained language models

Self-conditioning pre-trained language models

Filter Distillation for Network Compression

Contact Info

Product

Resources

About