Curiosity · AI Model
Marco-o1
Marco-o1 is Alibaba MarcoPolo Team's 2024 open-weight reasoning model, designed to replicate the step-by-step 'thinking' behaviour of OpenAI's o1 using Monte Carlo Tree Search over reasoning trajectories. Built on Qwen2-7B-Instruct and fine-tuned with chain-of-thought synthetic data, it was an early public demonstration that small models can produce o1-style reasoning with smart inference-time search.
Model specs
- Vendor
- Alibaba
- Family
- Marco
- Released
- 2024-11
- Context window
- 32,000 tokens
- Modalities
- text
Strengths
- Pioneering open demonstration of o1-style reasoning in 7B scale
- Apache-licensed via the underlying Qwen2 base
- Shows clear gains from MCTS-guided chain-of-thought
Limitations
- Much higher inference cost due to tree search
- Superseded by DeepSeek-R1 and OpenAI o3 on harder benchmarks
- Only 7B — struggles on deeply multi-step problems
Use cases
- Open-source research on reasoning-via-search
- Teaching Monte Carlo Tree Search applied to LLMs
- Baselines for building custom test-time-compute systems
- Math and logic tutors in Chinese/English
Benchmarks
| Benchmark | Score | As of |
|---|---|---|
| MGSM (English) | ~90% | 2026-04 |
| MGSM (Chinese) | ~82% | 2026-04 |
| GSM8K | ~89% | 2026-04 |
Frequently asked questions
What is Marco-o1?
Marco-o1 is Alibaba MarcoPolo Team's open-weight reasoning LLM — built on Qwen2-7B-Instruct and enhanced with Monte Carlo Tree Search over chain-of-thought trajectories to imitate OpenAI's o1-style test-time reasoning.
How does Marco-o1 differ from DeepSeek-R1?
Marco-o1 is smaller (7B vs. R1's much larger MoE) and relies heavily on MCTS inference-time search, while DeepSeek-R1 is trained end-to-end with reinforcement learning on reasoning traces. R1 is stronger; Marco-o1 is a cheaper open research baseline.
Sources
- Marco-o1 on HuggingFace — accessed 2026-04-20
- Marco-o1 paper (arXiv) — accessed 2026-04-20