Curiosity · AI Model
Jamba 1.5 Large
Jamba 1.5 Large is AI21 Labs' August 2024 open-weights flagship — a 398B total / 94B active hybrid MoE that interleaves Mamba state-space model layers with Transformer attention. The hybrid design delivers linear-time long-context inference with strong quality, and the 256K window was industry-leading at release.
Model specs
- Vendor
- AI21 Labs
- Family
- Jamba
- Released
- 2024-08
- Context window
- 256,000 tokens
- Modalities
- text
- Input price
- $2/M tok
- Output price
- $8/M tok
- Pricing as of
- 2026-04-20
Strengths
- Open weights under the Jamba Open Model License
- Hybrid architecture — linear-time long-context inference via Mamba
- Industry-leading 256K context at 2024 release
- Strong retrieval performance on RULER long-context benchmark
Limitations
- Hybrid inference stack less mature than pure Transformer — fewer runtimes support it
- Custom Jamba license is permissive but not OSI-standard
- Trails Llama 3.1 405B and DeepSeek V3 on general reasoning benchmarks
- Community adoption smaller than Llama / Qwen / Mistral families
Use cases
- Long-document Q&A and summarization at 256K context
- RAG pipelines benefiting from low-latency long-prompt processing
- Research on hybrid SSM+Transformer architectures
- Enterprise workloads with mixed structured and unstructured inputs
Benchmarks
| Benchmark | Score | As of |
|---|---|---|
| MMLU | ≈81% | 2024-08 |
| RULER 256K | ≈92% | 2024-08 |
| HumanEval | ≈71% | 2024-08 |
Frequently asked questions
What is Jamba 1.5 Large?
AI21 Labs' 398B total / 94B active open-weights hybrid MoE released August 2024. It interleaves Mamba state-space model layers with Transformer attention to deliver linear-time long-context inference and a 256K window.
Why hybrid SSM + Transformer?
Attention has quadratic cost in context length, while Mamba-style state-space models run in linear time. Jamba's hybrid approach keeps attention for quality while using Mamba layers to make 256K context economically viable.
Should I deploy Jamba instead of a pure Transformer?
Consider Jamba when very long context with fast inference is a primary requirement. For general chat and coding, mainstream Transformer MoEs like DeepSeek V3 or Llama 4 Maverick have richer tooling and slightly better benchmark quality.
Sources
- AI21 Labs — Jamba 1.5 announcement — accessed 2026-04-20
- Hugging Face — ai21labs/AI21-Jamba-1.5-Large — accessed 2026-04-20