2 min read

OFTv2: Making Orthogonal Finetuning Faster and More Scalable for AI Models

OFTv2: Making Orthogonal Finetuning Faster and More Scalable for AI Models

Orthogonal finetuning (OFT) has emerged as a powerful technique for adapting large foundation models to downstream tasks while preventing catastrophic forgetting. However, its high computational and memory costs have limited its practical deployment—until now. A new paper titled Orthogonal Finetuning Made Scalable introduces OFTv2, a reformulation that dramatically improves the efficiency of this approach.

The Problem with Original OFT

Traditional OFT works by learning layer-shared orthogonal matrices to transform pretrained weight matrices. While effective, this method relies on costly matrix-matrix multiplications with cubic complexity, making it impractical for large models. The authors identified this as the core bottleneck and sought a solution.

OFTv2: A Computational Breakthrough

The key innovation in OFTv2 is shifting from a weight-centric to an input-centric implementation. Instead of merging learned orthogonal matrices into weight matrices during training, OFTv2 applies these transformations directly to input vectors during each forward pass. This simple but profound change reduces computational complexity from cubic to quadratic.

The paper also introduces the Cayley–Neumann parameterization, which approximates matrix inversion in the Cayley transform using a truncated Neumann series. This modification improves numerical stability while further reducing computational overhead.

Quantized Model Support

Recognizing that modern AI often requires quantization to fit large models into GPU memory, the authors extended OFTv2 to work with quantized foundation models. The resulting method, called QOFT, outperforms popular approaches like QLoRA in training stability, efficiency, and memory usage.

Performance Gains

The results are impressive:

  • 10× faster training compared to original OFT
  • 3× lower GPU memory usage
  • Comparable or better performance than LoRA/QLoRA with fewer parameters

The method has been tested across various model types (BART, Llama-2, Qwen2.5) and sizes (0.5B to 72B parameters), demonstrating consistent improvements in both efficiency and task performance.

Why This Matters

As foundation models continue to grow in size and capability, efficient adaptation methods become increasingly critical. OFTv2 represents a significant step forward in making orthogonal finetuning practical for real-world applications, particularly when working with quantized models.

The paper's authors—from Max Planck Institute for Intelligent Systems, CUHK, University of Cambridge, and Alan Turing Institute—have made their code available, paving the way for broader adoption of this technique in the AI community.

For businesses leveraging large language models or other foundation models, OFTv2 offers a more efficient path to customization while maintaining model stability and performance—a combination that could accelerate AI adoption across industries.