Adaptive Parameterization for Neural Dialogue Generation

Cai, Hengyi; Chen, Hongshen; Zhang, Cheng; Song, Yujiang; Zhao, Xianfeng; Yin, Dawei

doi:10.18653/v1/d19-1188

Cited by 8 publications

(7 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Second, replay approaches [34,2,25] (or rehearsal approaches), replay examples of previous tasks while training the model on a new one. Third, architecture-based approaches [4,20,40] rely on the decomposition of the inference function. For instance, new approaches leveraging techniques of neural architecture search [20,40] have been proposed.…”

Section: Related Workmentioning

confidence: 99%

Continual Learning of Long Topic Sequences in Neural Information Retrieval

Gerald,

Soulier

2022

Preprint

View full text Add to dashboard Cite

In information retrieval (IR) systems, trends and users' interests may change over time, altering either the distribution of requests or contents to be recommended. Since neural ranking approaches heavily depend on the training data, it is crucial to understand the transfer capacity of recent IR approaches to address new domains in the long term. In this paper, we first propose a dataset based upon the MSMarco corpus aiming at modeling a long stream of topics as well as IR property-driven controlled settings. We then in-depth analyze the ability of recent neural IR models while continually learning those streams. Our empirical study highlights in which particular cases catastrophic forgetting occurs (e.g., level of similarity between tasks, peculiarities on text length, and ways of learning models) to provide future directions in terms of model design.

show abstract

Section: Related Workmentioning

confidence: 99%

Continual Learning of Long Topic Sequences in Neural Information Retrieval

Gerald,

Soulier

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Ha et al (2017) propose the general idea of generating the parameters of a network by another network. The proposed model in Cai et al (2019) generates the parameters of an encoderdecoder architecture by referring to the contextaware and topic-aware input. Suarez (2017) uses a hypernetwork to scale the weights of the main recurrent network.…”

Section: Related Workmentioning

confidence: 99%

AdaTag: Multi-Attribute Value Extraction from Product Profiles with Adaptive Decoding

Yan¹,

Zalmout²,

Liang³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Automatic extraction of product attribute values is an important enabling technology in e-Commerce platforms. This task is usually modeled using sequence labeling architectures, with several extensions to handle multi-attribute extraction. One line of previous work constructs attribute-specific models, through separate decoders or entirely separate models. However, this approach constrains knowledge sharing across different attributes. Other contributions use a single multiattribute model, with different techniques to embed attribute information. But sharing the entire network parameters across all attributes can limit the model's capacity to capture attribute-specific characteristics. In this paper we present AdaTag, which uses adaptive decoding to handle extraction. We parameterize the decoder with pretrained attribute embeddings, through a hypernetwork and a Mixture-of-Experts (MoE) module. This allows for separate, but semantically correlated, decoders to be generated on the fly for different attributes. This approach facilitates knowledge sharing, while maintaining the specificity of each attribute. Our experiments on a realworld e-Commerce dataset show marked improvements over previous methods. * Most of the work was done during an internship at Amazon.

show abstract

“…Lifelong learning [6,41] tackles this issue by enhancing the models with the ability to continuously learn over time and accumulate knowledge from streams of information sampled across domains, either previously observed or not. The three common lifelong learning approaches are [41]: 1) regularization that constrains the objective function with a forget cost term [22,26,48]; 2) network expansion that adapts the network architecture to new tasks by adding neurons and layers [5,43]; and 3) memory models that retrain the network using instances selected from a memory drawn from different data distributions [2,32].…”

Section: Background and Related Workmentioning

confidence: 99%

Studying Catastrophic Forgetting in Neural Ranking Models

Lovón-Melgarejo

Soulier

Pinel-Sauvagnat

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Several deep neural ranking models have been proposed in the recent IR literature. While their transferability to one target domain held by a dataset has been widely addressed using traditional domain adaptation strategies, the question of their cross-domain transferability is still under-studied. We study here in what extent neural ranking models catastrophically forget old knowledge acquired from previously observed domains after acquiring new knowledge, leading to performance decrease on those domains. Our experiments show that the effectiveness of neural IR ranking models is achieved at the cost of catastrophic forgetting and that a lifelong learning strategy using a cross-domain regularizer successfully mitigates the problem. Using an explanatory approach built on a regression model, we also show the effect of domain characteristics on the rise of catastrophic forgetting. We believe that the obtained results can be useful for both theoretical and practical future work in neural IR.

show abstract

Adaptive Parameterization for Neural Dialogue Generation

Cited by 8 publications

References 28 publications

Continual Learning of Long Topic Sequences in Neural Information Retrieval

Continual Learning of Long Topic Sequences in Neural Information Retrieval

AdaTag: Multi-Attribute Value Extraction from Product Profiles with Adaptive Decoding

Studying Catastrophic Forgetting in Neural Ranking Models

Contact Info

Product

Resources

About