FORTRESS: How AI is Making Robots Safer in Unpredictable Environments
Autonomous robots are increasingly operating in unstructured, open-world environments—from delivery drones navigating urban landscapes to quadruped robots inspecting construction sites. But what happens when these robots encounter scenarios far outside their training data? A new framework called FORTRESS, developed by researchers at Stanford University and NVIDIA, aims to prevent these "out-of-distribution" (OOD) failures by leveraging multi-modal AI reasoning in real time.
The Challenge of Open-World Robotics
Traditional robotic systems excel in controlled environments but struggle when faced with novel, unpredictable scenarios. A drone trained to avoid stationary obstacles might falter when confronted with a burning building or a crowded rooftop party. Similarly, an ANYmal robot navigating a construction site might misinterpret a worker on a ladder as two separate, harmless objects rather than a potential hazard.
Current approaches often rely on rigid, pre-defined fallback behaviors or human intervention—neither of which scales well for robots operating in dynamic, real-world settings. FORTRESS bridges this gap by combining the high-level reasoning of foundation models (like LLMs and VLMs) with real-time motion planning.
How FORTRESS Works
The framework operates in two phases:
- Low-Frequency Preparation: During normal operation, FORTRESS uses vision-language models (VLMs) to identify potential fallback goals (e.g., safe rooftops for a drone) and anticipates failure modes (like "high temperature" or "chemical spill"). It then calibrates semantic safety cost functions to quickly identify unsafe regions during a crisis.
- Real-Time Response: When a runtime monitor detects an anomaly, FORTRESS rapidly synthesizes a fallback plan—avoiding semantically unsafe areas while dynamically adjusting to reach a safe goal. This is achieved through a combination of embedding-based semantic reasoning and reach-avoid planning techniques.
Key Innovations
- Multi-Modal Reasoning: FORTRESS translates abstract safety strategies (e.g., "land on an empty roof") into concrete, executable plans by querying VLMs for goal locations and using depth data to map them in 3D space.
- Semantic Safety Cost Functions: By calibrating text embeddings against known safe data, the system can infer whether a new scenario (e.g., "person on a ladder") is dangerously OOD, even if it hasn’t encountered it before.
- Real-Time Planning: Unlike traditional methods that rely on slow, on-the-fly queries to large models, FORTRESS pre-computes safety constraints and fallback options, enabling sub-second response times during emergencies.
Performance and Applications
In benchmarks, FORTRESS outperformed baselines in safety classification accuracy (achieving >90% balanced accuracy on synthetic and real-world datasets). On hardware, it enabled a quadrotor drone to avoid landing near hazards like burning buildings and guided an ANYmal robot around unsafe construction zones.
Why This Matters for Business
For industries deploying autonomous systems—from logistics to infrastructure inspection—FORTRESS represents a leap forward in reliability. By reducing the need for human oversight and enabling robots to handle edge cases autonomously, it lowers operational risks and expands the feasible deployment domains for robotics.
The framework’s ability to "reason" about safety in human-like terms (e.g., avoiding "worker injury" rather than just collision) also aligns with regulatory and ethical priorities as AI-powered automation scales.
Limitations and Future Work
The current system requires predefined fallback strategies (like "land on roofs") and static safety radii. Future iterations could dynamically adapt constraints based on context (e.g., wind conditions) or learn strategies from operational handbooks.
As robots venture further into our unstructured world, FORTRESS offers a blueprint for keeping them—and the humans around them—safe.