Expert Gate: Lifelong Learning with a Network of Experts

Aljundi, Rahaf; Chakravarty, Punarjay; Tuytelaars, Tinne

doi:10.1109/cvpr.2017.753

Cited by 415 publications

(310 citation statements)

References 32 publications

Supporting

Mentioning

298

Contrasting

Order By: Relevance

“…Our proposed method is a data-based approach, but it is different from prior works [3,12,24,33], because their model commonly learns with the task-wise local distillation loss in Eq. (2). We emphasize that local distillation only preserves the knowledge within each of the previous tasks, while global distillation does the knowledge over all tasks.…”

Section: Related Workmentioning

confidence: 99%

Overcoming Catastrophic Forgetting With Unlabeled Data in the Wild

Lee

Shin

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

136

View full text Add to dashboard Cite

Lifelong learning with deep neural networks is wellknown to suffer from catastrophic forgetting: the performance on previous tasks drastically degrades when learning a new task. To alleviate this effect, we propose to leverage a large stream of unlabeled data easily obtainable in the wild. In particular, we design a novel classincremental learning scheme with (a) a new distillation loss, termed global distillation, (b) a learning strategy to avoid overfitting to the most recent task, and (c) a confidence-based sampling method to effectively leverage unlabeled external data. Our experimental results on various datasets, including CIFAR and ImageNet, demonstrate the superiority of the proposed methods over prior methods, particularly when a stream of unlabeled data is accessible: our method shows up to 15.8% higher accuracy and 46.5% less forgetting compared to the state-of-the-art method. The code is available at https://github. comB. We design a 3-step learning scheme to improve the effectiveness of global distillation: (i) training a teacher specialized for the current task, (ii) training a model by distilling the knowledge of the previous model, the teacher learned in (i), and their ensemble, and (iii) finetuning to avoid overfitting to the current task.C. We propose a confidence-based sampling method to effectively leverage a large stream of unlabeled data.

show abstract

Section: Related Workmentioning

confidence: 99%

Overcoming Catastrophic Forgetting With Unlabeled Data in the Wild

Lee

Shin

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

136

View full text Add to dashboard Cite

show abstract

“…For example, if we have learned "lion" and "tiger" earlier, that can help us later in time to learn a "liger" (a rare hybrid cross between a male lion and a female tiger), even with just a few examples. Relating this to point (2) above, this further allows compositional lifelong learning to help recognize new facts (e.g. dog, riding, wave ) based on facts seen earlier in time (e.g.…”

Section: Concepts Of Varying Complexitymentioning

confidence: 95%

“…We argue that the evaluation of LLL methods should be reconsidered. In the standard LLL (with a few notable exceptions, such as [2,4]), the trained models are judged by their capability to recognize each task's categories individually assuming the absence of the categories covered by the remaining tasks -not necessarily realistic. Although the performance of each task in isolation is an important characteristic, it might be deceiving.…”

Section: Concepts Of Varying Complexitymentioning

confidence: 99%

Exploring the Challenges Towards Lifelong Fact Learning

Elhoseiny

Babiloni

Aljundi

et al. 2019

Computer Vision – ACCV 2018

Self Cite

View full text Add to dashboard Cite

So far life-long learning (LLL) has been studied in relatively smallscale and relatively artificial setups. Here, we introduce a new large-scale alternative. What makes the proposed setup more natural and closer to human-like visual systems is threefold: First, we focus on concepts (or facts, as we call them) of varying complexity, ranging from single objects to more complex structures such as objects performing actions, and objects interacting with other objects. Second, as in real-world settings, our setup has a long-tail distribution, an aspect which has mostly been ignored in the LLL context. Third, facts across tasks may share structure (e.g., person, riding,wave and dog, riding, wave ). Facts can also be semantically related (e.g., "liger" relates to seen categories like "tiger" and "lion"). Given the large number of possible facts, a LLL setup seems a natural choice. To avoid model size growing over time and to optimally exploit the semantic relations and structure, we combine it with a visual semantic embedding instead of discrete class labels. We adapt existing datasets with the properties mentioned above into new benchmarks, by dividing them semantically or randomly into disjoint tasks. This leads to two large-scale benchmarks with 906,232 images and 165,150 unique facts, on which we evaluate and analyze state-of-the-art LLL methods.

show abstract

“…Some of the aforementioned methods make use of conditional computation, i.e. the gating network selects a subset of experts to evaluate while others stay idle [51,19,3]. While this is computationally efficient, routing errors can occur, i.e.…”

Section: Related Workmentioning

confidence: 99%

Expert Sample Consensus Applied to Camera Re-Localization

Brachmann

Rother

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

Fitting model parameters to a set of noisy data points is a common problem in computer vision. In this work, we fit the 6D camera pose to a set of noisy correspondences between the 2D input image and a known 3D environment. We estimate these correspondences from the image using a neural network. Since the correspondences often contain outliers, we utilize a robust estimator such as Random Sample Consensus (RANSAC) or Differentiable RANSAC (DSAC) to fit the pose parameters. When the problem domain, e.g. the space of all 2D-3D correspondences, is large or ambiguous, a single network does not cover the domain well. Mixture of Experts (MoE) is a popular strategy to divide a problem domain among an ensemble of specialized networks, so called experts, where a gating network decides which expert is responsible for a given input. In this work, we introduce Expert Sample Consensus (ESAC), which integrates DSAC in a MoE. Our main technical contribution is an efficient method to train ESAC jointly and end-to-end. We demonstrate experimentally that ESAC handles two real-world problems better than competing methods, i.e. scalability and ambiguity. We apply ESAC to fitting simple geometric models to synthetic images, and to camera re-localization for difficult, real datasets.

show abstract

Expert Gate: Lifelong Learning with a Network of Experts

Cited by 415 publications

References 32 publications

Overcoming Catastrophic Forgetting With Unlabeled Data in the Wild

Overcoming Catastrophic Forgetting With Unlabeled Data in the Wild

Exploring the Challenges Towards Lifelong Fact Learning

Expert Sample Consensus Applied to Camera Re-Localization

Contact Info

Product

Resources

About