2 min read

Constrained Entropic Unlearning: A New Framework for Efficiently Removing Sensitive Data from LLMs

Constrained Entropic Unlearning: A New Framework for Efficiently Removing Sensitive Data from LLMs

Large Language Models (LLMs) are increasingly deployed in real-world applications, but they often contain sensitive, outdated, or proprietary information that needs to be removed without retraining from scratch. A new paper, Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models, introduces a novel approach to this challenge, offering a more stable and efficient method for selective forgetting while preserving model utility.

The Problem with Current Unlearning Methods

Existing unlearning techniques typically treat forgetting and retention as a trade-off, combining both objectives into a single loss function. This scalarized approach often leads to unstable optimization and degraded performance on retained data, especially when aggressively forgetting information. The authors identify three key issues:

  1. Over-aggressive forgetting: Pushing the forget loss to near-zero values can unnecessarily degrade model utility.
  2. Conflicting gradients: Simultaneously optimizing for forgetting and retention creates conflicting gradient directions, slowing convergence.
  3. Unbounded losses: Conventional loss functions like cross-entropy are unbounded, leading to unstable gradients during unlearning.

A Constrained Optimization Approach

The paper proposes a conceptual shift: framing unlearning as a constrained optimization problem. Here, forgetting is enforced via a novel logit-margin flattening loss, which drives the model's output distribution toward uniformity on the forget set. Retention is preserved through a hard constraint on a separate retain set, ensuring performance doesn’t degrade beyond a user-specified threshold.

Key innovations:

  • Logit-margin flattening loss: Unlike entropy-based methods, this loss avoids softmax computations, improving numerical stability and maintaining non-vanishing gradients.
  • Primal-dual algorithm: A scalable solver dynamically adjusts the trade-off between forgetting and retention via dual variables, enabling efficient optimization even for large LLMs.

How It Works

The method decomposes unlearning into two components:

  1. Forgetting: The logit-margin flattening loss penalizes peakedness in the model’s pre-softmax logits, encouraging uniform predictions on the forget set.
  2. Retention: A cross-entropy loss on the retain set is constrained to stay within a small margin of the original model’s performance.

The primal-dual solver alternates between:

  • Updating model parameters to minimize the forget loss.
  • Adjusting the dual variable to penalize constraint violations (i.e., if retention performance degrades too much).

Results: Better Forgetting, Preserved Utility

The authors evaluated their method on the TOFU (Task of Fictitious Unlearning) and MUSE (Machine Unlearning Six-way Evaluation) benchmarks across multiple LLMs, including LLaMA 2, LLaMA 3, and Gemma. Key findings:

  • Higher forget success: Achieved better removal of targeted data compared to baselines like Gradient Ascent, DPO, and NPO.
  • Maintained utility: Retention performance stayed close to the original model, avoiding catastrophic forgetting.
  • Stable optimization: The logit-margin loss prevented gradient explosions seen in entropy-based methods.

Why This Matters

As LLMs are increasingly used in regulated industries (e.g., healthcare, finance), the ability to efficiently remove sensitive or copyrighted data is critical. This work provides:

  • Transparent trade-offs: Users explicitly control retention degradation via a constraint, rather than tuning a vague regularization weight.
  • Scalability: The primal-dual algorithm avoids costly inner-loop optimizations, making it feasible for billion-parameter models.
  • Robustness: The method resists over-forgetting, which can make models vulnerable to adversarial relearning attacks.

Limitations and Future Work

The authors note a slight drop in fluency on the forget set, likely due to the strong uniformity induced by logit flattening. Future directions include:

  • Hybrid losses to balance fluency and forgetting.
  • Resilience testing against relearning attacks.
  • Extending the framework to continual unlearning scenarios.

The Bottom Line

This paper reframes LLM unlearning as a constrained optimization problem, offering a more principled and scalable solution than prior methods. For businesses deploying LLMs, it provides a way to comply with data removal requests (e.g., GDPR’s right to be forgotten) without sacrificing model performance—a critical step toward responsible AI deployment.

Read the full paper on arXiv: Constrained Entropic Unlearning