Curiosity · AI Model

Microsoft Florence-2

Florence-2 is Microsoft's compact, open vision foundation model — 232M (base) and 771M (large) parameter variants — trained on the FLD-5B dataset of 5.4B annotations across 126M images. A single sequence-to-sequence architecture handles captioning, detection, segmentation, grounding, and OCR by swapping task prompts. Released under MIT on Hugging Face, it is widely used as a small-footprint vision backbone in research and product pipelines.

Model specs

Vendor: Microsoft Research
Family: Florence
Released: 2024-06
Context window: 1,024 tokens
Modalities: text, vision

Strengths

Unified prompt-driven interface across many vision tasks
Tiny footprint — runs on consumer GPUs
Permissive MIT licence
Strong quality-per-parameter thanks to FLD-5B pre-training

Limitations

Not a conversational VLM — no free-form chat
Fixed prompt grammar — out-of-template prompts under-perform
Smaller context and detail vs 70B-class VLMs
Needs post-processing for structured outputs (bboxes, masks)

Use cases

Dataset labelling and auto-annotation
Light-weight visual grounding in agent pipelines
Edge / on-device OCR and captioning
Pre-processing stage feeding a downstream LLM

Benchmarks

Benchmark	Score	As of
COCO caption CIDEr (large)	≈140	2024-06
COCO detection mAP (large, zero-shot)	≈43	2024-06
TextCaps CIDEr	≈78	2024-06

Frequently asked questions

What is Florence-2?

Florence-2 is a compact open vision foundation model from Microsoft that uses a single seq2seq architecture to perform captioning, detection, segmentation, grounding, and OCR — triggered by task prompts.

How big is Florence-2?

There are two released variants: Florence-2-base with 232M parameters and Florence-2-large with 771M parameters, both under the MIT licence.

Is Florence-2 a chat model?

No. It is a task-prompt model, not a conversational VLM. For free-form chat over images, pair it with an LLM or use a model like Qwen2-VL.

Sources

Florence-2 on Hugging Face — accessed 2026-04-20
Florence-2 paper (arXiv) — accessed 2026-04-20