05 Jun 2025 2 min read

FastMEMIT: Slashing Precomputation Time for AI Knowledge Editing by 99.7%

The Hidden Cost of Editing AI Brains

Editing the knowledge inside large language models (LLMs) has become a hot topic in AI research. Techniques like MEMIT, ROME, and EMMET allow us to update facts in transformer models without retraining—think of it like performing brain surgery on an AI. But there's a catch: before you can make any edits, you need to run an expensive "precomputation" step that can take days on a single GPU.

New research from UC Berkeley and University of Virginia reveals we've been massively overcomputing. The team discovered that these editing methods can work just as well with less than 0.3% of the original precomputation requirements, potentially saving hundreds of hours of GPU time when editing new models.

The Precomputation Bottleneck

Current "locate-then-edit" methods like MEMIT require caching hidden representations from millions of Wikipedia tokens before editing can begin. For GPT-J (6B), this means:

44 million hidden vectors per layer
36 hours on an NVIDIA A6000 GPU
Similar scaling for larger models (40 hours for Llama2-7B)

"It's like having to read the entire encyclopedia before you can correct a typo," explains lead researcher Akshat Gupta. "We found this is fundamentally unnecessary."

The Breakthrough: FastMEMIT

The team analyzed the mathematical foundations of editing algorithms and found the theoretical minimum precomputation needed. For GPT2-XL (hidden dimension 1600), only ~6,400 independent vectors are mathematically required—a tiny fraction of current practice.

Key findings:

10x Minimum Works Best: While the theoretical minimum is d_k vectors (e.g., 6,400 for GPT2-XL), using 10× that amount (64,000) achieves 95%+ of full performance
Massive Time Savings:

GPT2-XL: 12.8k tokens vs. 44M (0.03%)
GPT-J: 32k tokens vs. 44M (0.07%)
Precomputation completes in minutes vs. days

Performance Maintained: Editing quality (efficacy, paraphrase, neighborhood scores) remains within 5% of full precomputation

Why This Matters for Business

Faster Experimentation: Teams can test edits on new models within minutes rather than waiting days
Cost Reduction: Eliminates hundreds of GPU hours per model
Scalability: Makes knowledge editing feasible for larger models where 44M precomputation would be prohibitive

The Fine Print

The approach works slightly differently across models:

GPT Models: Dynamic multiplier of 2-3 suffices
Llama2: Requires multiplier of 10 for stable performance
Small batch edits (<10) need minor regularization

"This isn't just about saving compute," notes co-author Thomas Hartvigsen. "It fundamentally changes how quickly we can adapt models to new information—critical for business applications where facts change rapidly."

What's Next

The team has open-sourced their implementation, opening the door for:

Real-time model updating systems
More frequent fact-checking cycles
Practical applications in legal, medical, and financial domains where accuracy is critical

As AI models continue to grow, techniques like FastMEMIT that reduce the overhead of model maintenance will become increasingly valuable. The era of waiting days to correct an AI's knowledge may soon be over.