FastMEMIT: Slashing Precomputation Time for AI Knowledge Editing by 99.7%
The Hidden Cost of Editing AI Brains
Editing the knowledge inside large language models (LLMs) has become a hot topic in AI research. Techniques like MEMIT, ROME, and EMMET allow us to update facts in transformer models without retraining—think of it like performing brain surgery on an AI. But there's a catch: before you can make any edits, you need to run an expensive "precomputation" step that can take days on a single GPU.
New research from UC Berkeley and University of Virginia reveals we've been massively overcomputing. The team discovered that these editing methods can work just as well with less than 0.3% of the original precomputation requirements, potentially saving hundreds of hours of GPU time when editing new models.
The Precomputation Bottleneck
Current "locate-then-edit" methods like MEMIT require caching hidden representations from millions of Wikipedia tokens before editing can begin. For GPT-J (6B), this means:
- 44 million hidden vectors per layer
- 36 hours on an NVIDIA A6000 GPU
- Similar scaling for larger models (40 hours for Llama2-7B)
"It's like having to read the entire encyclopedia before you can correct a typo," explains lead researcher Akshat Gupta. "We found this is fundamentally unnecessary."
The Breakthrough: FastMEMIT
The team analyzed the mathematical foundations of editing algorithms and found the theoretical minimum precomputation needed. For GPT2-XL (hidden dimension 1600), only ~6,400 independent vectors are mathematically required—a tiny fraction of current practice.
Key findings:
- 10x Minimum Works Best: While the theoretical minimum is d_k vectors (e.g., 6,400 for GPT2-XL), using 10× that amount (64,000) achieves 95%+ of full performance
- Massive Time Savings:
- GPT2-XL: 12.8k tokens vs. 44M (0.03%)
- GPT-J: 32k tokens vs. 44M (0.07%)
- Precomputation completes in minutes vs. days
- Performance Maintained: Editing quality (efficacy, paraphrase, neighborhood scores) remains within 5% of full precomputation
Why This Matters for Business
- Faster Experimentation: Teams can test edits on new models within minutes rather than waiting days
- Cost Reduction: Eliminates hundreds of GPU hours per model
- Scalability: Makes knowledge editing feasible for larger models where 44M precomputation would be prohibitive
The Fine Print
The approach works slightly differently across models:
- GPT Models: Dynamic multiplier of 2-3 suffices
- Llama2: Requires multiplier of 10 for stable performance
- Small batch edits (<10) need minor regularization
"This isn't just about saving compute," notes co-author Thomas Hartvigsen. "It fundamentally changes how quickly we can adapt models to new information—critical for business applications where facts change rapidly."
What's Next
The team has open-sourced their implementation, opening the door for:
- Real-time model updating systems
- More frequent fact-checking cycles
- Practical applications in legal, medical, and financial domains where accuracy is critical
As AI models continue to grow, techniques like FastMEMIT that reduce the overhead of model maintenance will become increasingly valuable. The era of waiting days to correct an AI's knowledge may soon be over.