Curiosity · AI Model

Microsoft Florence-2

Florence-2 is Microsoft's compact, open vision foundation model — 232M (base) and 771M (large) parameter variants — trained on the FLD-5B dataset of 5.4B annotations across 126M images. A single sequence-to-sequence architecture handles captioning, detection, segmentation, grounding, and OCR by swapping task prompts. Released under MIT on Hugging Face, it is widely used as a small-footprint vision backbone in research and product pipelines.

Model specs

Vendor
Microsoft Research
Family
Florence
Released
2024-06
Context window
1,024 tokens
Modalities
text, vision

Strengths

  • Unified prompt-driven interface across many vision tasks
  • Tiny footprint — runs on consumer GPUs
  • Permissive MIT licence
  • Strong quality-per-parameter thanks to FLD-5B pre-training

Limitations

  • Not a conversational VLM — no free-form chat
  • Fixed prompt grammar — out-of-template prompts under-perform
  • Smaller context and detail vs 70B-class VLMs
  • Needs post-processing for structured outputs (bboxes, masks)

Use cases

  • Dataset labelling and auto-annotation
  • Light-weight visual grounding in agent pipelines
  • Edge / on-device OCR and captioning
  • Pre-processing stage feeding a downstream LLM

Benchmarks

BenchmarkScoreAs of
COCO caption CIDEr (large)≈1402024-06
COCO detection mAP (large, zero-shot)≈432024-06
TextCaps CIDEr≈782024-06

Frequently asked questions

What is Florence-2?

Florence-2 is a compact open vision foundation model from Microsoft that uses a single seq2seq architecture to perform captioning, detection, segmentation, grounding, and OCR — triggered by task prompts.

How big is Florence-2?

There are two released variants: Florence-2-base with 232M parameters and Florence-2-large with 771M parameters, both under the MIT licence.

Is Florence-2 a chat model?

No. It is a task-prompt model, not a conversational VLM. For free-form chat over images, pair it with an LLM or use a model like Qwen2-VL.

Sources

  1. Florence-2 on Hugging Face — accessed 2026-04-20
  2. Florence-2 paper (arXiv) — accessed 2026-04-20