What Is Cosmos 3?
Previous NVIDIA Cosmos models were purpose-built: Cosmos Predict for future prediction, Cosmos Transfer for domain transfer, Cosmos Reason for physical AI understanding, and Cosmos Policy for action generation. Cosmos 3 unifies all of these capabilities into a single open omnimodel.
The model accepts text, images, video, ambient sound, and actions as input — and can generate any combination of those modalities as output. A single model checkpoint now covers the full pipeline: visual reasoning → physically accurate synthetic video generation → action output.
NVIDIA's core pitch: physical AI systems should think before they act. Cosmos 3 is the infrastructure that makes that possible at scale, reducing training and evaluation cycles from months to days.
Two Models: Nano and Super
Cosmos 3 ships in two configurations:
| Model | Parameters | Primary Use Case | Target Hardware |
|---|---|---|---|
| Cosmos 3 Nano | 16B | Real-time robotics inference, edge deployment | NVIDIA RTX PRO 6000 |
| Cosmos 3 Super | 64B | Large-scale synthetic data generation, advanced physical reasoning | NVIDIA Hopper / Blackwell |
| Cosmos 3 Edge | Coming soon | Real-time inference at the edge | Edge devices |
Nano is designed to run on workstation-class GPUs for real-time robotics applications. Super targets datacenter GPUs for high-fidelity synthetic data generation and the most demanding physical reasoning tasks.
Mixture-of-Transformers Architecture
The technical breakthrough in Cosmos 3 is its Mixture-of-Transformers (MoT) architecture. This design subsumes multiple traditionally separate architectures — vision-language models (VLM), world models, world action models (WAM), and vision-language-action (VLA) models — into a single unified system.
This means that post-training Cosmos 3 on a specific domain transforms it into a world action model capable of perceiving its environment, reasoning about physics, generating plausible futures, and producing appropriate actions — all from a single checkpoint.
Model weights for Cosmos 3 Nano and Super are available immediately on Hugging Face. Post-training scripts, example code, and six open-source SDG datasets are on GitHub at github.com/nvidia/cosmos. Developers can also try Cosmos 3 through NVIDIA NIM microservices — the Cosmos 3 Reasoner NIM is available now, with the Generator NIM following soon.
NVIDIA Cosmos Coalition
Alongside the model launch, NVIDIA announced the Cosmos Coalition — a collaboration between world model builders and AI developers. Founding members include Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI. The coalition is focused on advancing the next generation of open world models for physical AI applications.
Open-Source Datasets
NVIDIA simultaneously released six synthetic data generation (SDG) datasets covering:
- Robotics manipulation
- Physics simulation
- Spatial reasoning
- Human motion
- Autonomous driving
- Warehouse environments
These datasets are designed for post-training Cosmos 3 and other physical AI models, and are available on Hugging Face alongside the model weights.
— Cosmos 3 Nano & Super: Available now on Hugging Face and GitHub
— Cosmos 3 Edge: Coming soon for real-time edge inference
— Cosmos 3 Reasoner NIM: Available now via NVIDIA NIM microservices
— Cosmos 3 Generator NIM: Coming soon
— Cloud deployment: Baseten, CoreWeave, Microsoft Azure, Nebius, Deep Infra, Classmethod
Why World Models Matter for Physical AI
Training robots and autonomous vehicles requires vast amounts of real-world data — which is expensive to collect and nearly impossible to gather for rare edge cases. Cosmos 3 solves this by generating physically accurate synthetic video and action scenarios at scale, letting teams train and evaluate models against conditions they couldn't easily reproduce in the real world.
Key Takeaways
- Cosmos 3 is the world's first fully open omnimodel — handling text, image, video, sound, and action I/O in a single model
- Mixture-of-Transformers architecture unifies VLM, world model, WAM, and VLA into one checkpoint
- Available in two sizes: Nano (16B) for edge/workstation, Super (64B) for datacenter
- NVIDIA Cosmos Coalition launched with Agile Robots, Runway, and other partners
- Six open-source SDG datasets released alongside model weights on Hugging Face
— NVIDIA Newsroom — Official Cosmos 3 Press Release
— GitHub — NVIDIA Cosmos Open-Source Code & Examples
— NVIDIA Developer Blog — Technical Deep Dive
— NVIDIA Blog — "How Cosmos 3 Helps Physical AI Think Before It Acts"