Curiosity · AI Model

OpenVLA

OpenVLA is an open-source 7B vision-language-action model from Stanford, Google DeepMind, Toyota Research, and collaborators. It fine-tunes a Prismatic VLM (DINOv2 + SigLIP + Llama 2 7B) on 970k robot trajectories from the Open X-Embodiment dataset and outputs discretised 7-DoF actions. Weights and training code are fully open under an Apache-2.0-style licence, making it the go-to baseline for academic robotics research on generalist policies.

Model specs

Vendor
Stanford / Google DeepMind / TRI (collaboration)
Family
OpenVLA
Released
2024-06
Context window
1 tokens
Modalities
text, vision, code

Strengths

  • Fully open weights + training code under permissive licence
  • Outperforms RT-2-X on common benchmarks despite being smaller
  • Efficient LoRA adaptation to new embodiments
  • Active research community and downstream forks

Limitations

  • Action space is tokenised — not as smooth as flow-matching policies like π0
  • Single-arm table-top focus — locomotion and bi-manual less covered
  • Requires 24 GB+ GPU for full-precision inference
  • Needs task-specific fine-tuning for most production use

Use cases

  • Open baseline for generalist manipulation research
  • LoRA fine-tuning to new robots with small demonstration sets
  • Comparative evaluation of VLA design choices
  • Teaching and benchmarking embodied-AI curricula

Benchmarks

BenchmarkScoreAs of
BridgeV2 / RT-2-X evaluation suite≈16 pp higher success than RT-2-X on average2024-06
LoRA fine-tune to new embodimentmatches scratch RT-2 with ~1% of compute2024-06

Frequently asked questions

What is OpenVLA?

OpenVLA is a 7B-parameter open-source vision-language-action model trained on the Open X-Embodiment robot dataset, released by Stanford, Google DeepMind, Toyota Research, and collaborators.

Why use OpenVLA instead of RT-2?

RT-2 weights are closed. OpenVLA is fully open, reproducible, and on common benchmarks slightly outperforms RT-2-X while being easier to fine-tune with LoRA.

What licence is OpenVLA under?

Weights and code are released under an MIT-style licence that allows research and commercial use, subject to the underlying data terms.

Sources

  1. OpenVLA project site — accessed 2026-04-20
  2. OpenVLA paper (arXiv) — accessed 2026-04-20