Curiosity · AI Model

Marco-o1

Marco-o1 is Alibaba MarcoPolo Team's 2024 open-weight reasoning model, designed to replicate the step-by-step 'thinking' behaviour of OpenAI's o1 using Monte Carlo Tree Search over reasoning trajectories. Built on Qwen2-7B-Instruct and fine-tuned with chain-of-thought synthetic data, it was an early public demonstration that small models can produce o1-style reasoning with smart inference-time search.

Model specs

Vendor
Alibaba
Family
Marco
Released
2024-11
Context window
32,000 tokens
Modalities
text

Strengths

  • Pioneering open demonstration of o1-style reasoning in 7B scale
  • Apache-licensed via the underlying Qwen2 base
  • Shows clear gains from MCTS-guided chain-of-thought

Limitations

  • Much higher inference cost due to tree search
  • Superseded by DeepSeek-R1 and OpenAI o3 on harder benchmarks
  • Only 7B — struggles on deeply multi-step problems

Use cases

  • Open-source research on reasoning-via-search
  • Teaching Monte Carlo Tree Search applied to LLMs
  • Baselines for building custom test-time-compute systems
  • Math and logic tutors in Chinese/English

Benchmarks

BenchmarkScoreAs of
MGSM (English)~90%2026-04
MGSM (Chinese)~82%2026-04
GSM8K~89%2026-04

Frequently asked questions

What is Marco-o1?

Marco-o1 is Alibaba MarcoPolo Team's open-weight reasoning LLM — built on Qwen2-7B-Instruct and enhanced with Monte Carlo Tree Search over chain-of-thought trajectories to imitate OpenAI's o1-style test-time reasoning.

How does Marco-o1 differ from DeepSeek-R1?

Marco-o1 is smaller (7B vs. R1's much larger MoE) and relies heavily on MCTS inference-time search, while DeepSeek-R1 is trained end-to-end with reinforcement learning on reasoning traces. R1 is stronger; Marco-o1 is a cheaper open research baseline.

Sources

  1. Marco-o1 on HuggingFace — accessed 2026-04-20
  2. Marco-o1 paper (arXiv) — accessed 2026-04-20