04 Jun 2025 2 min read

IllumiCraft: The Future of Controllable Video Generation with Unified Geometry and Illumination Diffusion

The Challenge of Video Relighting

Lighting is one of the most critical elements in visual storytelling, transforming flat 2D scenes into dynamic, three-dimensional experiences. Whether it’s the golden glow of a sunset or the stark contrast of a spotlight, lighting shapes mood, depth, and realism. Yet, despite its importance, most AI-driven video generation tools treat lighting as an afterthought—an uncontrollable variable rather than a creative lever.

Enter IllumiCraft, a new diffusion-based framework that brings precise lighting and geometry control to video generation. Developed by researchers from the University of Oxford, UC Merced, NEC Labs America, Atmanity Inc., and Google DeepMind, this model doesn’t just tweak brightness—it understands how light interacts with 3D space, producing videos with physically accurate shadows, highlights, and reflections.

How IllumiCraft Works

Traditional video relighting methods struggle with two key challenges: temporal consistency (avoiding flickering or abrupt lighting changes) and physically plausible light-scene interactions (ensuring shadows and reflections move naturally with objects and camera motion). IllumiCraft tackles both by integrating three key inputs:

HDR Video Maps – High-dynamic-range lighting data for fine-tuned illumination control.
Synthetically Relit Frames – Randomized lighting variations to teach the model how light affects appearance.
3D Point Tracks – Precise geometric data to ensure lighting reacts correctly to scene structure.

By combining these cues within a unified diffusion architecture, IllumiCraft generates videos where lighting adapts dynamically to motion while staying true to user-defined prompts. Whether you want a “moody blue spotlight piercing mist” or “natural sunlight filtering through trees,” the model delivers high-fidelity, temporally stable results.

Key Innovations

Geometry-Aware Lighting – Unlike previous methods that rely solely on implicit lighting cues, IllumiCraft explicitly models 3D geometry, ensuring shadows and highlights move realistically with objects.
Background-Conditioned Relighting – Users can provide a static background image, allowing the model to maintain scene context while altering illumination.
A Massive New Dataset – The team curated 20,170 video pairs with synchronized relighting, HDR maps, and 3D tracking data—a valuable resource for future research.

Performance & Benchmarks

In tests against leading video relighting models like IC-Light, RelightVid, and Light-A-Video, IllumiCraft outperformed them all. Key metrics:

43% lower FVD (Frechet Video Distance) than the best baseline.
Higher text alignment (CLIP similarity) and temporal consistency.
Faster inference (105 seconds for a 49-frame video vs. 645 seconds for Light-A-Video).

Qualitatively, IllumiCraft preserves fine details—like the texture of a rabbit’s fur or the gloss of an apple—where competitors often blur or oversmooth. It also handles complex lighting scenarios, such as moving spotlights or dynamic shadows, with remarkable stability.

Applications & Future Work

This technology has immediate applications in film post-production, virtual production, and advertising, where precise lighting control is essential. Future improvements could address edge cases (like occlusions in strong directional light) and expand the dataset to include more challenging lighting scenarios.

Ethical Considerations

As with any powerful generative tool, there’s potential for misuse—hyper-realistic relit videos could deepen concerns around deepfakes. The researchers emphasize the need for safeguards and detection methods to mitigate risks.

Final Thoughts

IllumiCraft represents a major leap forward in controllable video generation, bridging the gap between artistic intent and AI execution. By unifying geometry and illumination, it opens new creative possibilities while pushing the boundaries of what’s possible in AI-assisted video editing.

For more details, check out the full paper on arXiv or visit the project page.