Curiosity · AI Model

Marco-o1

Marco-o1 is Alibaba MarcoPolo Team's 2024 open-weight reasoning model, designed to replicate the step-by-step 'thinking' behaviour of OpenAI's o1 using Monte Carlo Tree Search over reasoning trajectories. Built on Qwen2-7B-Instruct and fine-tuned with chain-of-thought synthetic data, it was an early public demonstration that small models can produce o1-style reasoning with smart inference-time search.

Model specs

Vendor: Alibaba
Family: Marco
Released: 2024-11
Context window: 32,000 tokens
Modalities: text

Strengths

Pioneering open demonstration of o1-style reasoning in 7B scale
Apache-licensed via the underlying Qwen2 base
Shows clear gains from MCTS-guided chain-of-thought

Limitations

Much higher inference cost due to tree search
Superseded by DeepSeek-R1 and OpenAI o3 on harder benchmarks
Only 7B — struggles on deeply multi-step problems

Use cases

Open-source research on reasoning-via-search
Teaching Monte Carlo Tree Search applied to LLMs
Baselines for building custom test-time-compute systems
Math and logic tutors in Chinese/English

Benchmarks

Benchmark	Score	As of
MGSM (English)	~90%	2026-04
MGSM (Chinese)	~82%	2026-04
GSM8K	~89%	2026-04

Frequently asked questions

What is Marco-o1?

Marco-o1 is Alibaba MarcoPolo Team's open-weight reasoning LLM — built on Qwen2-7B-Instruct and enhanced with Monte Carlo Tree Search over chain-of-thought trajectories to imitate OpenAI's o1-style test-time reasoning.

How does Marco-o1 differ from DeepSeek-R1?

Marco-o1 is smaller (7B vs. R1's much larger MoE) and relies heavily on MCTS inference-time search, while DeepSeek-R1 is trained end-to-end with reinforcement learning on reasoning traces. R1 is stronger; Marco-o1 is a cheaper open research baseline.

Sources

Marco-o1 on HuggingFace — accessed 2026-04-20
Marco-o1 paper (arXiv) — accessed 2026-04-20