NVIDIA Launches Cosmos 3 — Open World Foundation Model for Physical AI

At Computex 2026, NVIDIA unveiled Cosmos 3, the world's first fully open omnimodel combining vision reasoning, world generation, and action prediction in a single system — slashing physical AI training cycles from months to days.

NVIDIA officially launched Cosmos 3 at GTC Taipei (Computex 2026) on May 31, 2026. Built on a breakthrough Mixture-of-Transformers architecture, it is the world's first fully open omnimodel that natively handles text, images, video, ambient sound, and actions — all in a single system. Model weights are freely available on Hugging Face, and the launch is accompanied by six open-source synthetic datasets for robotics, autonomous driving, and more.

What Is Cosmos 3?

Previous NVIDIA Cosmos models were purpose-built: Cosmos Predict for future prediction, Cosmos Transfer for domain transfer, Cosmos Reason for physical AI understanding, and Cosmos Policy for action generation. Cosmos 3 unifies all of these capabilities into a single open omnimodel.

The model accepts text, images, video, ambient sound, and actions as input — and can generate any combination of those modalities as output. A single model checkpoint now covers the full pipeline: visual reasoning → physically accurate synthetic video generation → action output.

NVIDIA's core pitch: physical AI systems should think before they act. Cosmos 3 is the infrastructure that makes that possible at scale, reducing training and evaluation cycles from months to days.

64B Cosmos 3 Super parameters — datacenter-grade physical reasoning

16B Cosmos 3 Nano parameters — workstation and edge deployment

Months → Days Reduction in physical AI training and evaluation cycle time

Two Models: Nano and Super

Cosmos 3 ships in two configurations:

Model	Parameters	Primary Use Case	Target Hardware
Cosmos 3 Nano	16B	Real-time robotics inference, edge deployment	NVIDIA RTX PRO 6000
Cosmos 3 Super	64B	Large-scale synthetic data generation, advanced physical reasoning	NVIDIA Hopper / Blackwell
Cosmos 3 Edge	Coming soon	Real-time inference at the edge	Edge devices

Nano is designed to run on workstation-class GPUs for real-time robotics applications. Super targets datacenter GPUs for high-fidelity synthetic data generation and the most demanding physical reasoning tasks.

Mixture-of-Transformers Architecture

The technical breakthrough in Cosmos 3 is its Mixture-of-Transformers (MoT) architecture. This design subsumes multiple traditionally separate architectures — vision-language models (VLM), world models, world action models (WAM), and vision-language-action (VLA) models — into a single unified system.

This means that post-training Cosmos 3 on a specific domain transforms it into a world action model capable of perceiving its environment, reasoning about physics, generating plausible futures, and producing appropriate actions — all from a single checkpoint.

💡

Getting Started
Model weights for Cosmos 3 Nano and Super are available immediately on Hugging Face. Post-training scripts, example code, and six open-source SDG datasets are on GitHub at github.com/nvidia/cosmos. Developers can also try Cosmos 3 through NVIDIA NIM microservices — the Cosmos 3 Reasoner NIM is available now, with the Generator NIM following soon.

NVIDIA Cosmos Coalition

Alongside the model launch, NVIDIA announced the Cosmos Coalition — a collaboration between world model builders and AI developers. Founding members include Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI. The coalition is focused on advancing the next generation of open world models for physical AI applications.

Open-Source Datasets

NVIDIA simultaneously released six synthetic data generation (SDG) datasets covering:

Robotics manipulation
Physics simulation
Spatial reasoning
Human motion
Autonomous driving
Warehouse environments

These datasets are designed for post-training Cosmos 3 and other physical AI models, and are available on Hugging Face alongside the model weights.

ℹ️

Availability
— Cosmos 3 Nano & Super: Available now on Hugging Face and GitHub
— Cosmos 3 Edge: Coming soon for real-time edge inference
— Cosmos 3 Reasoner NIM: Available now via NVIDIA NIM microservices
— Cosmos 3 Generator NIM: Coming soon
— Cloud deployment: Baseten, CoreWeave, Microsoft Azure, Nebius, Deep Infra, Classmethod

Why World Models Matter for Physical AI

Training robots and autonomous vehicles requires vast amounts of real-world data — which is expensive to collect and nearly impossible to gather for rare edge cases. Cosmos 3 solves this by generating physically accurate synthetic video and action scenarios at scale, letting teams train and evaluate models against conditions they couldn't easily reproduce in the real world.

Key Takeaways

Cosmos 3 is the world's first fully open omnimodel — handling text, image, video, sound, and action I/O in a single model
Mixture-of-Transformers architecture unifies VLM, world model, WAM, and VLA into one checkpoint
Available in two sizes: Nano (16B) for edge/workstation, Super (64B) for datacenter
NVIDIA Cosmos Coalition launched with Agile Robots, Runway, and other partners
Six open-source SDG datasets released alongside model weights on Hugging Face

🔗

Resources · Official Sources · Getting Started
— NVIDIA Newsroom — Official Cosmos 3 Press Release
— GitHub — NVIDIA Cosmos Open-Source Code & Examples
— NVIDIA Developer Blog — Technical Deep Dive
— NVIDIA Blog — "How Cosmos 3 Helps Physical AI Think Before It Acts"