VIKI-R: How AI is Teaching Robots to Work Together Like Never Before
Imagine a kitchen where robots seamlessly collaborate: one fetches a mug from a high cabinet while another washes an apple
ALE-Bench: A New Benchmark for Evaluating AI’s Long-Horizon Algorithm Engineering Skills
ALE-Bench: Pushing AI to Solve Real-World Optimization Problems
In the rapidly evolving field of AI, benchmarks that once seemed challenging
Self Forcing: How Autoregressive Video Diffusion Models Are Closing the Train-Test Gap
The world of video generation is undergoing a quiet revolution. While diffusion models have dominated the field with their ability
GUI-Reflection: How AI Models Are Learning to Self-Correct Like Humans
Imagine you’re using an app and accidentally tap the wrong button. You immediately recognize the mistake, hit ‘back,’ and
StableMTL: How Latent Diffusion Models Are Revolutionizing Multi-Task Learning
The Challenge of Multi-Task Learning in AI
Multi-task learning (MTL) is a cornerstone of modern AI systems, especially in computer
Reflect-then-Plan: A Doubly Bayesian Approach to Offline Model-Based Planning
Offline reinforcement learning (RL) is a powerful tool for training policies when online exploration is costly or unsafe. However, it
Cartridges: A Memory-Efficient Alternative to In-Context Learning for Long Documents
Large language models (LLMs) are increasingly being used to answer queries grounded in extensive text corpora—whether that's
Distillation Robustifies Unlearning: A New Method to Make AI Forget
Large language models (LLMs) are trained on massive datasets, which means they inevitably learn things we’d rather they didn’
FARMS: Fixing Aspect Ratio Bias in Neural Network Eigenspectrum Analysis
Deep neural networks (DNNs) have become the backbone of modern AI systems, but understanding their inner workings remains a challenge.
Constrained Entropic Unlearning: A New Framework for Efficiently Removing Sensitive Data from LLMs
Large Language Models (LLMs) are increasingly deployed in real-world applications, but they often contain sensitive, outdated, or proprietary information that