A comprehensive list of papers about 'A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning'.
Forgetting refers to the loss or deterioration of previously acquired information or knowledge. While the existing surveys on forgetting have primarily focused on continual learning, forgetting is a prevalent phenomenon observed in various other research domains within deep learning. Forgetting manifests in research fields such as generative models due to generator shifts, and federated learning due to heterogeneous data distributions across clients. Addressing forgetting encompasses several challenges, including balancing the retention of old task knowledge with fast learning of new tasks, managing task interference with conflicting goals, and preventing privacy leakage, etc. Moreover, most existing surveys on continual learning implicitly assume that forgetting is always harmful. In contrast, our survey argues that forgetting is a double-edged sword and can be beneficial and desirable in certain cases, such as privacy-preserving scenarios. By exploring forgetting in a broader context, we aim to present a more nuanced understanding of this phenomenon and highlight its potential advantages. Through this comprehensive survey, we aspire to uncover potential solutions by drawing upon ideas and approaches from various fields that have dealt with forgetting. By examining forgetting beyond its conventional boundaries, in future work, we hope to encourage the development of novel strategies for mitigating, harnessing, or even embracing forgetting in real applications.
If you find our paper or this resource helpful, please consider citing:
@article{Forgetting_Survey_2024,
title={A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning},
author={Wang, Zhenyi and Yang, Enneng and Shen, Li and Huang, Heng},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2024},
publisher={IEEE}
}
Thanks!
- Harmful Forgetting
- Beneficial Forgetting
Harmful forgetting occurs when we desire the machine learning model to retain previously learned knowledge while adapting to new tasks, domains, or environments. In such cases, it is important to prevent and mitigate knowledge forgetting.
Problem Setting | Goal | Source of forgetting |
---|---|---|
Continual Learning | learn non-stationary data distribution without forgetting previous knowledge | data-distribution shift during training |
Foundation Model | unsupervised learning on large-scale unlabeled data | data-distribution shift in pre-training, fine-tuning |
Domain Adaptation | adapt to target domain while maintaining performance on source domain | target domain sequentially shift over time |
Test-time Adaptation | mitigate the distribution gap between training and testing | adaptation to the test data distribution during testing |
Meta-Learning | learn adaptable knowledge to new tasks | incrementally meta-learn new classes / task-distribution shift |
Generative Model | learn a generator to appriximate real data distribution | generator shift/data-distribution shift |
Reinforcement Learning | maximize accumulate rewards | state, action, reward and state transition dynamics |
Federated Learning | decentralized training without sharing data | model average; non-i.i.d data; data-distribution shift |
Links:
Forgetting in Continual Learning |
Forgetting in Foundation Models |
Forgetting in Domain Adaptation |
Forgetting in Test-Time Adaptation |
Forgetting in Meta-Learning |
Forgetting in Generative Models |
Forgetting in Reinforcement Learning |
Forgetting in Federated Learning
The goal of continual learning (CL) is to learn on a sequence of tasks without forgetting the knowledge on previous tasks.
Links: Task-aware CL | Task-free CL | Online CL | Semi-supervised CL | Few-shot CL | Unsupervised CL | Theoretical Analysis
Task-aware CL focuses on addressing scenarios where explicit task definitions, such as task IDs or labels, are available during the CL process. Existing methods on task-aware CL have explored five main branches: Memory-based Methods | Architecture-based Methods | Regularization-based Methods | Subspace-based Methods | Bayesian Methods.
Memory-based (or Rehearsal-based) method keeps a memory buffer that stores the examples/knowledges from previous tasks and replay those examples during learning new tasks.
The architecture-based approach avoids forgetting by reducing parameter sharing between tasks or adding parameters to new tasks.
Regularization-based approaches avoid forgetting by penalizing updates of important parameters or distilling knowledge with previous model as a teacher.
Subspace-based methods perform CL in multiple disjoint subspaces to avoid interference between multiple tasks.
Bayesian methods provide a principled probabilistic framework for addressing Forgetting.
Task-free CL refers to a specific scenario that the learning system does not have access to any explicit task information.
In online CL, the learner is only allowed to process the data for each task once.
The presence of imbalanced data streams in CL (especially online CL) has drawn significant attention, primarily due to its prevalence in real-world application scenarios.
Semi-supervised CL is an extension of traditional CL that allows each task to incorporate unlabeled data as well.
Few-shot CL refers to the scenario where a model needs to learn new tasks with only a limited number of labeled examples per task while retaining knowledge from previously encountered tasks.
Unsupervised CL (UCL) assumes that only unlabeled data is provided to the CL learner.
Paper Title | Year | Conference/Journal |
---|---|---|
Class-Incremental Unsupervised Domain Adaptation via Pseudo-Label Distillation | 2024 | TIP |
Plasticity-Optimized Complementary Networks for Unsupervised Continual | 2024 | WACV |
Unsupervised Continual Learning in Streaming Environments | 2023 | TNNLS |
Representational Continuity for Unsupervised Continual Learning | 2022 | ICLR |
Probing Representation Forgetting in Supervised and Unsupervised Continual Learning | 2022 | CVPR |
Unsupervised Continual Learning for Gradually Varying Domains | 2022 | CVPRW |
Co2L: Contrastive Continual Learning | 2021 | ICCV |
Unsupervised Progressive Learning and the STAM Architecture | 2021 | IJCAI |
Continual Unsupervised Representation Learning | 2019 | NeurIPS |
Theory or analysis of continual learning
Foundation models are large machine learning models trained on a vast quantity of data at scale, such that they can be adapted to a wide range of downstream tasks.
Links: Forgetting in Fine-Tuning Foundation Models | Forgetting in One-Epoch Pre-training | CL in Foundation Model
When fine-tuning a foundation model, there is a tendency to forget the pre-trained knowledge, resulting in sub-optimal performance on downstream tasks.
Foundation models often undergo training on a dataset for a single pass. As a result, the earlier examples encountered during pre-training may be overwritten or forgotten by the model more quickly than the later examples.
Paper Title | Year | Conference/Journal |
---|---|---|
Efficient Continual Pre-training of LLMs for Low-resource Languages | 2024 | Arxiv |
Exploring Forgetting in Large Language Model Pre-Training | 2024 | Arxiv |
Measuring Forgetting of Memorized Training Examples | 2023 | ICLR |
Quantifying Memorization Across Neural Language Models | 2023 | ICLR |
Analyzing leakage of personally identifiable information in language models | 2023 | S&P |
How Well Does Self-Supervised Pre-Training Perform with Streaming Data? | 2022 | ICLR |
The challenges of continuous self-supervised learning | 2022 | ECCV |
Continual contrastive learning for image classification | 2022 | ICME |
By leveraging the powerful feature extraction capabilities of foundation models, researchers have been able to explore new avenues for advancing continual learning techniques.
The goal of domain adaptation is to transfer the knowledge from a source domain to a target domain.
Test time adaptation (TTA) refers to the process of adapting a pre-trained model on-the-fly to unlabeled test data during inference or testing.
Meta-learning, also known as learning to learn, focuses on developing algorithms and models that can learn from previous learning experiences to improve their ability to learn new tasks or adapt to new domains more efficiently and effectively.
Links: Incremental Few-Shot Learning | Continual Meta-Learning
Incremental few-shot learning (IFSL) focuses on the challenge of learning new categories with limited labeled data while retaining knowledge about previously learned categories.
The goal of continual meta-learning (CML) is to address the challenge of forgetting in non-stationary task distributions.
Paper Title | Year | Conference/Journal |
---|---|---|
Meta Continual Learning Revisited: Implicitly Enhancing Online Hessian Approximation via Variance Reduction | 2024 | ICLR |
Recasting Continual Learning as Sequence Modeling | 2023 | NeurIPS |
Adaptive Compositional Continual Meta-Learning | 2023 | ICML |
Learning to Learn and Remember Super Long Multi-Domain Task Sequence | 2022 | CVPR |
Meta-Learning with Less Forgetting on Large-Scale Non-Stationary Task Distributions | 2022 | ECCV |
Variational Continual Bayesian Meta-Learning | 2021 | NeurIPS |
Meta Learning on a Sequence of Imbalanced Domains with Difficulty Awareness | 2021 | ICCV |
Addressing Catastrophic Forgetting in Few-Shot Problems | 2020 | ICML |
Continuous meta-learning without tasks | 2020 | NeurIPS |
Reconciling meta-learning and continual learning with online mixtures of tasks | 2019 | NeurIPS |
Fast Context Adaptation via Meta-Learning | 2019 | ICML |
Online meta-learning | 2019 | ICML |
The goal of a generative model is to learn a generator that can generate samples from a target distribution.
Links: GAN Training is a Continual Learning Problem | Lifelong Learning of Generative Models
Treating GAN training as a continual learning problem.
The goal is to develop generative models that can continually generate high-quality samples for both new and previously encountered tasks.
Reinforcement learning is a machine learning technique that allows an agent to learn how to behave in an environment by trial and error, through rewards and punishments.
Federated learning (FL) is a decentralized machine learning approach where the training process takes place on local devices or edge servers instead of a centralized server.
Links: Forgetting Due to Non-IID Data in FL | Federated Continual Learning
This branch pertains to the forgetting problem caused by the inherent non-IID (not identically and independently distributed) data among different clients participating in FL.
This branch addresses the issue of continual learning within each individual client in the federated learning process, which results in forgetting at the overall FL level.
[Back to top] Beneficial forgetting arises when the model contains private information that could lead to privacy breaches or when irrelevant information hinders the learning of new tasks. In these situations, forgetting becomes desirable as it helps protect privacy and facilitate efficient learning by discarding unnecessary information.
Problem Setting | Goal |
---|---|
Mitigate Overfitting | mitigate memorization of training data through selective forgetting |
Debias and Forget Irrelevant Information | forget biased information to achieve better performance or remove irrelevant information to learn new tasks |
Machine Unlearning | forget some specified training data to protect user privacy |
Links: Combat Overfitting Through Forgetting | Learning New Knowledge Through Forgetting Previous Knowledge | Machine Unlearning
Overfitting in neural networks occurs when the model excessively memorizes the training data, leading to poor generalization. To address overfitting, it is necessary to selectively forget irrelevant or noisy information.
Paper Title | Year | Conference/Journal |
---|---|---|
"Forgetting" in Machine Learning and Beyond: A Survey | 2024 | Arxiv |
The Effectiveness of Random Forgetting for Robust Generalization | 2024 | ICLR |
Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier | 2023 | ICLR |
The Primacy Bias in Deep Reinforcement Learning | 2022 | ICML |
The Impact of Reinitialization on Generalization in Convolutional Neural Networks | 2021 | Arxiv |
Learning with Selective Forgetting | 2021 | IJCAI |
SIGUA: Forgetting May Make Learning with Noisy Labels More Robust | 2020 | ICML |
Invariant Representations through Adversarial Forgetting | 2020 | AAAI |
Forget a Bit to Learn Better: Soft Forgetting for CTC-based Automatic Speech Recognition | 2019 | Interspeech |
"Learning to forget" suggests that not all previously acquired prior knowledge is helpful for learning new tasks.
Machine unlearning, a recent area of research, addresses the need to forget previously learned training data in order to protect user data privacy.
Star History
Contact
We welcome all researchers to contribute to this repository 'forgetting in deep learning'.
Email: [email protected] | [email protected]