FlexiAct: The AI That Brings Any Image to Life with Custom Actions
FlexiAct: The AI That Brings Any Image to Life with Custom Actions
Imagine taking a static image—a cartoon character, a pet, or even a painting—and making it move exactly how you want. That’s the promise of FlexiAct, a new AI framework developed by researchers from Tsinghua University and Tencent ARC Lab. Unlike existing tools that require strict alignment between reference videos and target images, FlexiAct can transfer actions from one subject to another, even if they have completely different shapes, poses, or viewpoints.
Why FlexiAct Matters
Action transfer isn’t new—Hollywood has been doing it for years with motion capture and painstaking animation. But traditional methods are expensive and time-consuming. AI-powered alternatives exist, but they come with limitations:
- Pose-based methods (like AnimateAnyone) require perfect skeleton alignment between the reference and target, making them useless for non-human subjects or mismatched poses.
- Global motion methods (like MotionDirector) can’t adapt actions to different subjects—they just replicate camera movements or broad motions.
FlexiAct breaks these constraints. It can make a cat mimic a human’s dance, animate a cartoon character from a real-world video, or even transfer movements between two animals with different body structures. And it does this while preserving the target’s appearance—no weird distortions or mismatched details.
How FlexiAct Works
The magic lies in two key innovations:
- RefAdapter – A lightweight adapter that ensures the generated video stays true to the target image’s appearance, even when the reference video has a totally different layout or viewpoint. It’s trained to handle arbitrary frames as input, not just the first frame of a video, making it far more flexible than existing methods.
- Frequency-Aware Action Extraction (FAE) – Instead of relying on separate motion and appearance models, FAE dynamically adjusts what it pays attention to during the video generation process. Early in denoising, it focuses on broad motion (low-frequency details); later, it fine-tunes appearance (high-frequency details). This lets it extract and transfer actions more precisely.
Real-World Performance
The team tested FlexiAct on a diverse dataset, including humans, animals, and animated characters. In side-by-side comparisons, it outperformed baselines like MotionDirector in both motion fidelity (how well the action matches the reference) and appearance consistency (how well the subject looks like the original image).
For example:
- A reference video of a person doing yoga was successfully transferred to a cartoon character with a different body shape.
- A dog’s running motion was applied to a cat, despite their different gaits.
- Even cross-domain transfers (human → animal) worked convincingly.
Limitations and Future Work
FlexiAct isn’t perfect. Like other diffusion-based video tools, it requires per-video fine-tuning, meaning you can’t just plug in any reference clip instantly. The researchers suggest that future work could explore feed-forward methods to make the process faster.
The Big Picture
FlexiAct opens up exciting possibilities for content creation—think personalized animations, game character movements, or even restoring motion to historical photos. By decoupling action from rigid structural constraints, it brings us closer to AI that can truly "animate anything."
For more technical details, check out the full paper on arXiv. The team has also released their code and model weights to encourage further research.
This Moment in A.I. is your go-to source for breaking down the latest AI research shaping business and creativity. Stay tuned for more deep dives into the tools transforming industries.