Curiosity

AI Models

Frontier and open-weights LLMs — what each one is best at, what they cost, and when you should reach for which. Updated as the field moves.

196 entries · Sorted A→Z

Adobe Firefly Image 3

Firefly Image 3 is Adobe's commercially-safe generative image model, trained on licensed Adobe Stock content and deeply integrated into Photoshop, Illustrator, and Express.

AI Scientist v2

Sakana AI's AI Scientist v2 is an autonomous research agent that generates, runs, and writes up machine-learning experiments end-to-end.

all-mpnet-base-v2

all-mpnet-base-v2 is sentence-transformers' most widely used open English embedding model — a 110M MPNet fine-tune that has been the default RAG encoder for years.

AssemblyAI Universal-2

AssemblyAI Universal-2 is a batch-first speech-to-text model with state-of-the-art English WER and built-in LeMUR LLM features for summaries, chapters, and Q&A.

Aya 23 35B

Aya 23 35B is Cohere For AI's 2024 open-weights multilingual model — a 35-billion-parameter decoder built on Command R, tuned across 23 languages.

Aya Expanse 32B

Aya Expanse 32B is Cohere For AI's follow-up multilingual open-weights model — a 32B Command-family decoder covering 23 languages with state-of-the-art per-language quality.

BAAI BGE Reranker v2-M3

BGE Reranker v2-M3 is BAAI's open-weight multilingual cross-encoder reranker — pairs naturally with BGE-M3 embeddings for a fully open-source RAG pipeline.

BAAI BGE-M3

BGE-M3 is BAAI's open-weight multilingual embedding model — one backbone producing dense, sparse, and multi-vector retrievals over 100+ languages with 8k context.

Baichuan 4

Baichuan Intelligent's Baichuan 4 is a closed Chinese LLM with 192k context, strong reasoning and bilingual performance, widely used in Chinese enterprise.

BART Large

BART Large is Meta AI's classic 2019 sequence-to-sequence transformer — a bidirectional-encoder, autoregressive-decoder model used for summarisation, translation, and text generation.

Black Forest Labs FLUX.1 [dev]

FLUX.1 [dev] is Black Forest Labs' open-weight 12B diffusion transformer — near-[pro] quality for research and non-commercial use, with a growing LoRA ecosystem.

Black Forest Labs FLUX.1 [pro]

FLUX.1 [pro] is Black Forest Labs' flagship closed text-to-image model — state-of-the-art prompt adherence and photorealism, served via bfl.ai and partner APIs.

BloombergGPT

BloombergGPT is a 50-billion-parameter finance-specialised LLM trained on Bloomberg's proprietary financial corpus — a landmark domain model for finance NLP.

Cartesia Sonic

Sonic is Cartesia's low-latency text-to-speech model built on state-space-model (Mamba-style) architectures — sub-90 ms time-to-first-audio for real-time voice agents.

ChatGPT 4o Canvas

ChatGPT 4o Canvas is OpenAI's side-by-side writing and coding surface — a GPT-4o variant tuned for inline edits, structured document drafting, and collaborative code review in the ChatGPT app.

Claude 2.1

Claude 2.1 is Anthropic's late-2023 flagship — introduced the 200K-token context window and improved refusal behaviour. Now a legacy model referenced mostly for benchmark comparisons.

Claude 3 Haiku

Claude 3 Haiku is Anthropic's original March 2024 small, fast, cheap model — the first Haiku tier, still widely deployed in legacy pipelines despite being surpassed by Haiku 3.5 and 4.5.

Claude 3 Opus

Claude 3 Opus is Anthropic's March 2024 flagship — the original Opus tier that established Claude as a GPT-4-class frontier model with strong long-context and reasoning performance.

Claude 3 Sonnet

Claude 3 Sonnet is Anthropic's March 2024 mid-tier model — the original Sonnet that balanced cost and quality in the Claude 3 launch before 3.5 Sonnet redefined the tier.

Claude 3.5 Haiku

Claude 3.5 Haiku is Anthropic's November 2024 small model — fast, cheap, and the first Haiku to match or beat Claude 3 Opus on several coding and reasoning benchmarks.

Claude 3.5 Sonnet

Claude 3.5 Sonnet is the June 2024 model that made Claude famous for coding — state-of-the-art SWE-bench at launch, tool use, vision, and the first computer-use preview.

Claude 3.7 Sonnet

Claude 3.7 Sonnet is Anthropic's February 2025 hybrid reasoning model — the first Claude with extended thinking, mixing fast responses and long chain-of-thought in one model.

Claude Code

Claude Code is Anthropic's official agentic command-line product — a terminal-first coding agent built on the Claude models, with native tool use, file editing, and git integration.

Claude Haiku 4.5

Claude Haiku 4.5 is Anthropic's fast, low-cost 2025 model — matches Sonnet 4 on many tasks at about one-third the price and double the speed, ideal for sub-tasks and real-time UX.

Claude Instant 1.2

Claude Instant 1.2 is Anthropic's 2023 low-latency chat model — the cheap, fast sibling of Claude 1. Deprecated in favour of the Haiku line but still referenced in many legacy apps.

Claude Opus 4.7

Claude Opus 4.7 is Anthropic's top-tier model for long-context reasoning, code generation, and agentic workflows. 1M context, native tool use, strong on SWE-bench.

Claude Sonnet 4.5

Claude Sonnet 4.5 is Anthropic's September 2025 Sonnet refresh — a best-in-class coding model at the time with 200K context, extended thinking, and strong agent behaviour.

Claude Sonnet 4.6

Claude Sonnet 4.6 is Anthropic's everyday-workhorse model — balances quality and cost, 1M context, strong coding and tool use, and powers most Claude-based production apps in 2026.

Code Llama 13B

Code Llama 13B is Meta's 13-billion-parameter open-weights code-generation model — a Llama 2 fine-tune for Python, infilling, and instruction-following coding tasks.

Code Llama 70B

Code Llama 70B is Meta's code-specialized fine-tune of Llama 2 70B — a historical landmark for open-source coding models, now superseded by newer open coders like DeepSeek Coder V2 and Qwen Coder.

Codestral

Codestral is Mistral AI's code-specialized open-weights model — trained on 80+ programming languages with strong fill-in-the-middle support, shipped under the Mistral Non-Production License.

Cohere Embed v3

Cohere Embed v3 is a multilingual retrieval embedding model with input-type prompts (search_document, search_query) and strong BEIR scores for enterprise RAG.

Cohere Rerank 3

Cohere Rerank 3 is a cross-encoder reranker for RAG — score (query, document) pairs to boost top-k relevance after a first-stage embedding retrieval.

Cohere Rerank 3 (Multilingual)

Cohere Rerank 3 Multilingual is a cross-encoder reranking model over 100+ languages — reorders retrieval hits by query relevance for RAG and search at low latency.

Command R

Command R is Cohere's RAG-first production LLM — a mid-size model tuned for grounded answers with citations, tool use, and multilingual enterprise deployments.

Command R+

Command R+ is Cohere's 104B open-weights model purpose-built for RAG and tool-use — strong citation quality and multilingual support under the CC-BY-NC research license.

DALL·E 2

DALL·E 2 is OpenAI's 2022 text-to-image diffusion model that popularised prompt-based image generation with unCLIP — a CLIP-guided prior plus cascaded diffusion decoder.

DBRX Instruct

Databricks DBRX Instruct is a 132B-parameter open-weight MoE model (36B active) trained on 12T tokens, optimised for enterprise data and lakehouse RAG.

Deepgram Nova-3

Deepgram Nova-3 is a streaming-first speech-to-text model — sub-300 ms real-time transcription with diarisation, keyterm prompting, and strong accented-English WER.

DeepMind AlphaProof

AlphaProof is Google DeepMind's AI math-proof system that achieved silver-medal IMO performance — Gemini-trained reinforcement learning over Lean 4 theorem-proving environments.

DeepSeek Coder 33B Instruct

DeepSeek Coder 33B Instruct is DeepSeek AI's 2023 open-weights coding LLM — a 33B dense decoder trained on 2T tokens of code, fluent in 80+ programming languages.

DeepSeek Coder V2

DeepSeek Coder V2 is the open-weights coding SOTA — a 236B parameter MoE (21B active) that matched closed-frontier coding models on HumanEval and LiveCodeBench.

DeepSeek LLM 67B

DeepSeek LLM 67B is DeepSeek AI's 2023 general-purpose open-weights model — a 67-billion-parameter dense decoder that served as the bilingual Chinese/English foundation for later DeepSeek releases.

DeepSeek R1

DeepSeek R1 is the first open-weights reasoning model to credibly compete with OpenAI o1 — MIT-licensed, with distilled variants down to 1.5B for local inference.

DeepSeek V2.5

DeepSeek V2.5 is the combined chat + coder unification of DeepSeek's V2 line — a 236B/21B-active MoE released in September 2024 that preceded the V3 breakthrough.

DeepSeek V3 is a 671B parameter open-weights Mixture-of-Experts model from Chinese AI lab DeepSeek — it matched GPT-4-class quality at a fraction of the training cost, reshaping open-source LLM expectations.

DeepSeek-Math 7B

DeepSeek-Math 7B is a specialised open-weight LLM trained on 120B math tokens, matching much larger models on MATH and GSM8K benchmarks.

DeepSeek-Prover V2

DeepSeek-Prover V2 is DeepSeek's open-weights formal theorem prover for Lean 4, trained with reinforcement learning and self-play — state-of-the-art on MiniF2F and PutnamBench.

DeepSeek-VL2

DeepSeek-VL2 is a family of mixture-of-experts vision-language models (3B / 16B / 27B total, 1B / 2.8B / 4.5B active) with strong OCR and grounding on a DeepSeekMoE backbone.

E5-Large v2

E5-Large v2 is Microsoft Research's open-weights English text embedding model — a ~335M-parameter MiniLM-derived encoder widely used as a strong, cheap baseline for retrieval.

ElevenLabs Multilingual v2

ElevenLabs Multilingual v2 is the leading text-to-speech model for expressive multilingual voice cloning — 29+ languages, voice design, and studio-grade dubbing.

Emu 2

Emu 2 is Meta's large multimodal generative model — a 37B parameter vision-language model capable of image generation, in-context editing, and multimodal reasoning.

Figure Helix (Figure 02)

Helix is Figure AI's generalist vision-language-action model for the Figure 02 humanoid — a dual-system architecture with a slow VLM planner and a fast 200 Hz visuomotor policy.

Gemini 1.5 Flash

Gemini 1.5 Flash is Google's May 2024 fast, cheap, 1M-context Flash tier — the first sub-$0.50/M token Gemini, widely deployed in 2024-25 for RAG and bulk pipelines.

Gemini 1.5 Pro

Gemini 1.5 Pro is Google's February 2024 long-context flagship — the model that popularised 1M (and briefly 2M) token context windows and native video understanding.

Gemini 2.0 Flash

Gemini 2.0 Flash is Google's December 2024 agent-oriented model — native tool use, multimodal input + output, and 1M context at Flash-tier cost.

Gemini 2.0 Flash Thinking

Gemini 2.0 Flash Thinking is Google's experimental December 2024 reasoning model — a 2.0 Flash variant that exposes chain-of-thought for math, science, and coding.

Gemini 2.5 Flash

Gemini 2.5 Flash is Google's fast, low-cost 2025 workhorse — a thinking model with 1M context, native multimodality, and strong price/performance on Vertex AI and the Gemini API.

Gemini 2.5 Pro

Gemini 2.5 Pro is Google's flagship long-context multimodal model — 2M tokens, excellent video/document understanding, and tight integration with Google Cloud and Workspace.

Gemini Embedding 001

Gemini Embedding 001 is Google's flagship text embedding model — 3,072-dim vectors, state-of-the-art MTEB multilingual scores, and 2K-token inputs for RAG and semantic search.

Gemini Ultra 1.0

Gemini Ultra 1.0 is Google DeepMind's original top-tier multimodal model — launched February 2024 as the MMLU-leading variant of the Gemini 1.0 family.

Gemma 2 2B

Google's Gemma 2 2B is a tiny 2.6-billion-parameter open-weight model, distilled from larger Gemma teachers, ideal for edge and browser inference.

Gemma 2 9B

Gemma 2 9B is Google's 2024 open-weights small model — a 9B dense transformer that punched above its weight on English reasoning benchmarks under the Gemma license.

Gemma 3 12B

Gemma 3 12B is Google DeepMind's open mid-size multimodal LLM with 128k context, vision input, and wide language coverage — a strong single-GPU alternative to Llama 3.1 8B.

Gemma 3 1B

Gemma 3 1B is Google's ultra-compact open-weights LLM — a ~1-billion-parameter model tuned for on-device inference, classroom experiments, and edge deployments.

Gemma 3 27B

Gemma 3 27B is Google's 2025 open-weights flagship in the Gemma family — a multimodal 27B model derived from Gemini research, with vision, long context, and the permissive Gemma license.

Gemma 3 4B

Gemma 3 4B is Google DeepMind's open 4B-parameter multimodal small LLM with 128k context, vision input, and 140+ language coverage — built on Gemini 2.0 research.

GLM-4 Plus

Zhipu AI's GLM-4 Plus is a Chinese flagship LLM with 128k context, strong on bilingual (Chinese/English) tasks, reasoning, and tool use.

Google DeepMind AlphaFold 3

AlphaFold 3 is Google DeepMind's biology model that predicts joint structures of proteins, DNA, RNA, ligands, and ions — a step-change for drug-discovery workflows.

Google MathGemma

MathGemma is Google DeepMind's math-specialised member of the Gemma family — fine-tuned on high-quality mathematics corpora for step-by-step reasoning and Lean proof sketching.

Google Med-PaLM 2

Med-PaLM 2 is Google Research's medical-specialist LLM — 86.5% on MedQA (US Medical Licensing Exam-style) and the reference for clinical-grade domain LLMs.

Google RT-2

RT-2 is Google DeepMind's vision-language-action (VLA) model that maps robot camera images and text instructions to low-level motor actions, generalising to novel objects and scenes.

Google Veo 2

Veo 2 is Google DeepMind's text-to-video model — 8-second 4K-capable clips with strong cinematic lighting and camera control, served via Vertex AI and Labs.

GPT Realtime

GPT Realtime is OpenAI's low-latency speech-to-speech model for voice agents — direct audio in, audio out, ~300ms turn-taking, function calling, and interruptions supported over WebRTC.

GPT-3.5 Turbo

GPT-3.5 Turbo is OpenAI's original production workhorse from the ChatGPT era — a fast, cheap 16K-context model that powered most LLM apps built between 2023 and 2024.

GPT-4 Turbo

GPT-4 Turbo is OpenAI's late-2023 flagship — a 128K-context GPT-4 variant with cheaper pricing, JSON mode, and vision input. Still widely used in legacy enterprise stacks.

GPT-4.1

GPT-4.1 is OpenAI's April 2025 refresh of GPT-4 — a 1M-context, instruction-following model built for coding, long-document work, and agent pipelines at lower cost than GPT-4o.

GPT-4o

GPT-4o is OpenAI's 2024 omni-modal flagship — a single model that natively handles text, vision, and audio with ~320ms voice latency and strong reasoning at lower cost than GPT-4 Turbo.

GPT-4o Vision

GPT-4o's native vision capability lets the omni-modal model read charts, screenshots, handwriting, and documents — the workhorse VLM behind ChatGPT's image-understanding features.

GPT-5

GPT-5 is OpenAI's 2026 flagship multimodal LLM — native audio/vision, unified reasoning modes, and deep ChatGPT + API integration. The default general-purpose model for most teams.

GPT-5 mini

GPT-5 mini is OpenAI's cost-efficient tier of the GPT-5 family — a unified reasoning-and-chat model that trades a small amount of quality for 5x lower price and faster responses.

GPT-5 nano

GPT-5 nano is OpenAI's cheapest and fastest GPT-5 tier — built for ultra-low-latency classification, routing, and high-volume workloads where quality-per-dollar trumps frontier reasoning.

GPT-5 Thinking

GPT-5 Thinking is OpenAI's flagship deliberate-reasoning mode — a variant of GPT-5 that spends extra inference tokens on hard math, code, and agent planning.

Grok 1.5

Grok 1.5 is xAI's March 2024 upgrade over Grok-1, extending context to 128k and significantly improving reasoning, math, and code performance.

Grok 2

Grok 2 is xAI's second-generation chat model — a frontier-tier LLM with image understanding and X (Twitter) real-time retrieval, released August 2024.

Grok 2 Vision

Grok 2 Vision is xAI's 2024 multimodal LLM adding image understanding to the Grok line, with 32k context and competitive pricing for visual Q&A.

Grok 3

Grok 3 is xAI's 2025 flagship LLM, known for its 'Think' reasoning mode and live X integration. 128k context, strong on math and coding.

Grok 4

xAI's Grok 4 is Elon Musk's flagship reasoning LLM for 2026, with native tool use, a 256k context, and real-time X (Twitter) grounding via Grok-Search.

GTE-Qwen2 7B Instruct

GTE-Qwen2 7B Instruct is Alibaba DAMO's 7B-parameter open text-embedding model — topped the MTEB leaderboard at release, built on the Qwen 2 backbone for 4096-dim dense retrieval.

Hunyuan-Large

Tencent's Hunyuan-Large is a 389B-parameter open-weight MoE model (52B active) with 256k context, strong on Chinese tasks and math reasoning.

Ideogram v2

Ideogram v2 is the text-to-image model best known for in-image typography — readable posters, logos, and UI mockups that other diffusion models struggle to render.

Imagen 3

Imagen 3 is Google's text-to-image generation model — high-fidelity photorealism, strong typography, and SynthID watermarking, available via Vertex AI and the Gemini API.

InternVL 2.5

InternVL 2.5 is OpenGVLab's open multimodal model family (1B–78B) matching GPT-4o on MMMU through scaled training, test-time scaling, and long-chain reasoning.

Jamba 1.5 Large

Jamba 1.5 Large is AI21 Labs' open-weights hybrid SSM-Transformer model — a 398B total / 94B active MoE combining Mamba and attention layers with 256K context.

Janus Pro 7B

Janus Pro 7B is DeepSeek AI's open-weights unified multimodal model — a 7B transformer that both understands and generates images through decoupled visual encoders.

Japanese Stable LM 2

Japanese Stable LM 2 is Stability AI Japan's open-weights Japanese-language LLM — a 1.6B Japanese-specialised model built from the Stable LM 2 backbone.

Jina Embeddings v3

Jina Embeddings v3 is an open-weight multilingual embedding model with 8k context, task LoRAs, and Matryoshka output — strong MTEB with Apache-compatible licensing.

Jina Embeddings v4

Jina Embeddings v4 is Jina AI's multilingual multimodal embedding model — 3.8B params, Matryoshka dimensions, late-interaction and single-vector modes for text, image, and visual-document retrieval.

Jina Reranker v2

Jina Reranker v2 is an open-weight multilingual cross-encoder reranker — fast, code-aware, and designed to pair with Jina Embeddings v3 for hybrid RAG.

Kimi K2

Moonshot AI's Kimi K2 is a trillion-parameter MoE model with ultra-long context, strong Chinese/English reasoning, and agentic coding.

Kling 1.5

Kling 1.5 is Kuaishou's text-to-video diffusion-transformer model — one of the first public systems to reliably generate 2-minute 1080p videos with strong motion coherence.

Krea 1

Krea 1 is Krea AI's first in-house text-to-image foundation model — aesthetics-focused, with real-time creative controls and strong photorealism for design workflows.

Llama 3.1 405B Instruct

Llama 3.1 405B is Meta's open-weights flagship dense model — the first open release to credibly challenge closed-frontier GPT-4-class quality on reasoning and knowledge.

Llama 3.1 70B Instruct

Llama 3.1 70B Instruct is Meta's mid-flagship open-weights model from July 2024 — the production workhorse that powered most of the open-source LLM boom before Llama 3.3 superseded it.

Llama 3.1 8B Instruct

Llama 3.1 8B Instruct is Meta's small open-weights workhorse — an 8B dense model tuned for edge inference, laptops, and low-cost classification and summarization pipelines.

Llama 3.1 Nemotron 70B Instruct

Nemotron 70B Instruct is NVIDIA's fine-tune of Llama 3.1 70B with reward-model-driven post-training — open-weights, and notably strong on LMSYS Arena versus the Llama 3.1 70B base.

Llama 3.3 70B Instruct

Meta's Llama 3.3 70B is a drop-in upgrade to Llama 3.1 70B — matching 405B-level quality in a 70B body through better post-training. The pragmatic open-weights workhorse.

Llama 4 Maverick

Meta's open-weights Llama 4 Maverick delivers frontier-class reasoning at self-host economics. Ideal when weights access, data sovereignty, or local inference matters more than absolute SOTA.

Llama 4 Scout

Meta's Llama 4 Scout is the smaller, edge-friendly sibling of Maverick — a 17B active / 109B total Mixture-of-Experts model with long context, designed for single-GPU inference and efficient fine-tuning.

Llama Guard 3

Llama Guard 3 is Meta's open-weights content-moderation classifier — an 8B Llama fine-tune that labels prompts and responses against a configurable safety taxonomy.

LLaVA 1.6 34B

LLaVA 1.6 34B is an open-weight vision-language model combining Nous-Hermes-Yi-34B with a CLIP vision tower, a key reference point for open VLM research.

Luma Dream Machine

Luma Dream Machine is Luma AI's text-to-video model — fast 5-second generations with strong motion, image-to-video loops, and a public API for pipeline integration.

Lyria 2

Lyria 2 is Google DeepMind's second-generation text-to-music model — generates high-fidelity instrumental and vocal tracks from natural-language prompts.

Marco-o1

Alibaba's Marco-o1 is an open-weight reasoning LLM that applies o1-style chain-of-thought search using Monte Carlo Tree Search over reasoning trajectories.

Mathstral 7B

Mathstral 7B is Mistral AI's open-weights math specialist — a 7B Mistral fine-tune aligned with Project Numina to solve Olympiad-style problems with chain-of-thought.

Meta MobileLLM 1.5B

MobileLLM 1.5B is Meta's sub-billion / sub-2B small language model family optimised for on-device inference — deep-and-thin architecture, embedding sharing, and grouped-query attention.

Microsoft Florence-2

Florence-2 is Microsoft's open vision foundation model (0.23B / 0.77B) with a unified prompt-based interface for captioning, detection, segmentation, OCR, and grounding.

Midjourney v6.1

Midjourney v6.1 is the premier artistic text-to-image model — exceptional aesthetic quality accessed through Discord and the Midjourney web app rather than a public API.

MiniMax Hailuo

Hailuo is MiniMax's text- and image-to-video model — a diffusion-transformer that became a viral favourite for fluid motion, realistic physics, and cinematic camera work.

Mistral Codestral 22B

Codestral 22B is Mistral AI's open-weight code LLM — 22B parameters across 80+ programming languages with strong HumanEval and fill-in-the-middle for IDE autocomplete.

Mistral Embed

Mistral Embed is Mistral AI's general-purpose text embedding model — 1024 dimensions, strong English and French quality, served from la Plateforme alongside Mistral's LLMs.

Mistral Large 3

Mistral Large 3 is Mistral AI's European flagship — strong multilingual reasoning, function calling, and data-sovereignty-friendly deployment through Mistral La Plateforme and Azure.

Mistral NeMo 12B

Mistral NeMo 12B is a 12B open-weights model co-developed by Mistral and NVIDIA — Apache 2.0 licensed, multilingual, with 128K context for its size class.

Mistral Small 24B

Mistral Small 24B is Mistral AI's early-2025 open-weights mid-size model — a 24-billion-parameter dense decoder designed for strong reasoning per dollar on single-GPU servers.

Mistral Small 3

Mistral Small 3 is a 24B open-weights model from Mistral AI — Apache 2.0 licensed, optimized for low-latency inference on a single GPU, and competitive with larger Llama variants.

Mixtral 8x22B

Mixtral 8x22B is Mistral's flagship open-weights Mixture-of-Experts model — 141B total, 39B active per token, Apache 2.0 licensed with strong multilingual and coding ability.

Molmo 72B

Allen AI's Molmo 72B is an open-weight multimodal LLM trained on the fully open PixMo dataset, rivalling closed VLMs on visual reasoning.

MPT-30B

MosaicML's MPT-30B is a 2023 open-weight 30-billion-parameter transformer with 8k context, an early commercial-licence LLM still used as a baseline.

mxbai-rerank-large-v1

mxbai-rerank-large-v1 is mixedbread.ai's open cross-encoder reranking model — state-of-the-art open reranker on BEIR, Apache 2.0 licensed, drop-in replacement for Cohere Rerank.

Nemotron Mini 4B Instruct

Nemotron Mini 4B Instruct is NVIDIA's compact open-weights LLM tuned for on-device chat — a 4-billion-parameter Minitron-derived model optimised for low-latency RTX GPUs.

Nemotron Ultra 253B

Nemotron Ultra 253B is NVIDIA's top-tier open-weights reasoning LLM — a 253B Llama-family model tuned for enterprise reasoning, math, and code.

Nomic Embed Text v2

Nomic Embed Text v2 is an open-weight, fully-auditable multilingual embedding model with Matryoshka support and long-context retrieval — a transparent alternative to closed APIs.

NV-Embed v2

NV-Embed v2 is NVIDIA's open-weights English embedding model — a Mistral 7B fine-tune that topped the MTEB leaderboard with leading retrieval, classification, and STS scores.

NVIDIA Cosmos

NVIDIA Cosmos is a family of world foundation models that generate physics-aware video futures for training and evaluating physical-AI agents — robots, autonomous vehicles, and simulators.

OpenAI DALL·E 3

DALL·E 3 is OpenAI's text-to-image model integrated into ChatGPT and the OpenAI API — known for strong prompt adherence, readable text, and SDXL-era quality.

OpenAI o1

OpenAI o1 is the September 2024 reasoning model that launched the "thinking model" era — trained with reinforcement learning to produce long internal chains of thought before answering.

OpenAI o1 Pro

OpenAI o1 Pro is the top-tier variant of the o1 reasoning series — a slower, more deliberate thinking model that spends additional inference compute on hard math, science, and coding problems.

OpenAI o3

OpenAI o3 is the April 2025 successor to o1 — a reasoning model with tool use, vision, and dramatically better scores on ARC-AGI, SWE-bench, and graduate-level science benchmarks.

OpenAI o4-mini

o4-mini is OpenAI's small reasoning model — a fast, cheap thinking model that matches or beats o3 on many math and coding benchmarks at a fraction of the cost.

OpenAI Sora

Sora is OpenAI's text-to-video model — generates up to 20-second 1080p clips from prompts, reference images, or remix edits, served through sora.com for ChatGPT Plus users.

OpenAI text-embedding-3-large

OpenAI text-embedding-3-large is a 3072-dim retrieval embedding model with Matryoshka support — top MTEB scores and the default choice for production RAG on the OpenAI stack.

OpenAI text-embedding-3-small

OpenAI text-embedding-3-small is a 1536-dim embedding model optimised for throughput — the cheap default for large-scale RAG ingestion on the OpenAI API.

OpenAI TTS-1-HD

OpenAI TTS-1-HD is OpenAI's high-fidelity text-to-speech model — six built-in voices for audiobooks, voice agents, and low-latency speech UX on the OpenAI API.

OpenAI Whisper v3 (large-v3)

Whisper large-v3 is OpenAI's open-weight speech-to-text model — 99 languages with strong WER on accented speech; a default for open-source transcription pipelines.

OpenELM 3B

Apple's OpenELM 3B is an open, on-device-friendly LLM using layer-wise scaling, released with full training recipe and CoreML export in 2024.

OpenVLA

OpenVLA is a 7B-parameter open-source vision-language-action model trained on the Open X-Embodiment dataset — a permissively licensed robot foundation model for manipulation research.

Orca-Math 7B

Microsoft's Orca-Math 7B is a math-specialised small LLM fine-tuned on synthetic GPT-4-generated math dialogues and feedback, strong on GSM8K.

PaLM 2

PaLM 2 is Google's 2023 flagship dense decoder LLM — the successor to PaLM that powered the original Bard and Duet AI for Workspace. Now deprecated in favour of the Gemini family.

Phi-2

Microsoft's Phi-2 is a 2.7B-parameter 'small but mighty' LLM trained on textbook-quality data, demonstrating how data curation beats raw model scale.

Phi-3-mini 128k

Phi-3-mini 128k is Microsoft's 3.8B-parameter small language model with a 128k context window — a tiny, laptop-runnable LLM that matches GPT-3.5 on many benchmarks.

Phi-3.5 Mini

Phi-3.5 Mini is Microsoft's 3.8B open-weights tiny model — designed for on-device inference on phones and laptops with surprisingly capable reasoning for its size.

Phi-4

Phi-4 is Microsoft Research's 14B open-weights model focused on reasoning — trained with a synthetic-data-heavy recipe that punches far above its weight class on math, logic, and coding benchmarks.

Phi-4 Multimodal

Microsoft's Phi-4 Multimodal is a 5.6B SLM unifying text, vision, and speech in one compact model, tuned for on-device and edge inference.

Physical Intelligence π0

π0 (pi-zero) is Physical Intelligence's generalist robot foundation model — a flow-matching vision-language-action policy trained on diverse multi-embodiment data for dexterous manipulation.

Pika 2.0

Pika 2.0 is Pika Labs' text-to-video model with a signature 'Scene Ingredients' feature for compositing characters, objects, and locations across shots.

Pixtral 12B

Pixtral 12B is Mistral AI's first open-weights vision-language model — a 12B parameter multimodal transformer capable of image captioning, document VQA, and chart reasoning.

Prompt Guard 2

Prompt Guard 2 is Meta's open-weights small classifier for detecting prompt-injection and jailbreak attempts — a sidecar filter designed to sit in front of any LLM.

Qodo Gen 1

Qodo (formerly CodiumAI) Qodo Gen 1 is a specialised code-generation and test-writing LLM tuned for IDE-integrated review and unit-test synthesis.

Qwen 2.5 3B

Qwen 2.5 3B is Alibaba's compact open small language model — a 3B-parameter LLM with 128k context, tool-use training, and multilingual coverage in 29 languages.

Qwen 2.5 72B Instruct

Qwen 2.5 72B Instruct is Alibaba's 2024 open-weights flagship dense model — Apache 2.0 licensed, matching Llama 3.1 405B on many benchmarks at a 72B footprint.

Qwen 2.5 Coder 32B

Qwen 2.5 Coder 32B is Alibaba's open-weights coding flagship — a 32B dense model that matched GPT-4o on HumanEval at release and runs on a single H100.

Qwen 3

Qwen 3 is Alibaba's 2025 flagship open-weights family — dense and MoE variants from 0.6B to 235B, Apache 2.0 licensed, with strong multilingual and reasoning behavior.

Qwen QwQ 32B

Qwen QwQ 32B is Alibaba's open-weights reasoning model — a 32B dense variant trained with reinforcement learning that competes with DeepSeek R1 at a much smaller footprint.

Qwen2-Audio 7B

Qwen2-Audio 7B is Alibaba's open-weights audio-language model — a 7B transformer that accepts speech, music, and environmental sounds and responds in natural-language text.

Qwen2-VL 72B

Qwen2-VL 72B is Alibaba's flagship open vision-language model with dynamic-resolution visual encoding, strong OCR, and 20-minute video understanding on the Qwen 2 backbone.

Qwen2.5-Math 72B

Qwen2.5-Math 72B is Alibaba's open-weights math specialist — a 72-billion-parameter Qwen2.5 fine-tune with tool-augmented (Python) reasoning for Olympiad-class problems.

Qwen2.5-VL 72B

Qwen2.5-VL 72B is Alibaba's top-tier open-weights vision-language model — a 72B transformer with agentic UI grounding, long-video understanding, and precise document OCR.

Recraft V3

Recraft V3 is a closed text-to-image model known for industry-leading text rendering and vector-style outputs — the model that topped Artificial Analysis's image leaderboard on launch.

Reka Core

Reka AI's Reka Core is a 2024 frontier-tier multimodal LLM with image, video, and audio understanding plus 128k context and multilingual coverage.

Reka Flash 3

Reka AI's Reka Flash 3 is a 21B open-weight reasoning LLM released in 2025 with 32k context and strong performance-per-dollar for enterprise use.

Reka Vision

Reka AI's Reka Vision is a multimodal product for enterprise video and image understanding, built on the Reka Core/Flash models with retrieval-grade search.

Replit Code v3

Replit Code v3 is Replit's in-house code LLM powering Replit Agent and Ghostwriter, tuned for cloud-IDE completions and full-stack app synthesis.

Resemble Rapid Voice Cloning

Resemble AI's Rapid Voice Cloning creates a high-fidelity custom voice from 10 seconds of reference audio, paired with a watermarking stack for responsible synthetic speech.

Runway Gen-3 Alpha

Runway Gen-3 Alpha is Runway's flagship video generator for filmmakers — 10-second clips with strong character consistency and a polished editing UI.

Sakana Evolutionary Model Merge

Sakana AI's Evolutionary Model Merge is a research system that uses evolutionary algorithms to combine open-weights LLMs — automatically discovering high-performing merged checkpoints.

SeamlessM4T v2

SeamlessM4T v2 is Meta's massively multilingual and multimodal translation model — speech and text in and out across nearly 100 languages through a unified encoder-decoder stack.

SFR-Embedding-Mistral

SFR-Embedding-Mistral is Salesforce Research's open-weights English embedding model — a Mistral 7B fine-tune that led the MTEB leaderboard at release.

Shengshu Vidu

Vidu is Shengshu Technology and Tsinghua's text- and image-to-video model, based on the U-ViT diffusion-transformer — the first Chinese Sora-class public video generator.

Skywork-o1-Open

Skywork's Skywork-o1-Open is an open-weight reasoning model family (8B/32B) reproducing o1-style chain-of-thought with strong math and code performance.

Stable Audio 2

Stable Audio 2 is Stability AI's text-to-audio model — generates full-length (up to 3-minute) music and sound-effect tracks from text prompts with optional audio-to-audio conditioning.

Stable Cascade

Stable Cascade is Stability AI's three-stage cascaded text-to-image model based on the Würstchen architecture — efficient high-resolution generation in a tiny latent space.

Stable Code 3B

Stability AI's Stable Code 3B is a tiny 3-billion-parameter code LLM with FIM support, strong for offline IDE completions on commodity hardware.

Stable Diffusion 2.1

Stable Diffusion 2.1 is Stability AI's late-2022 text-to-image latent diffusion model — a 768x768 successor to SD 1.5 with OpenCLIP H/14 conditioning. Now a legacy baseline.

Stable Diffusion 3.5 Large

Stable Diffusion 3.5 Large is Stability AI's 8B-parameter MMDiT text-to-image model — open weights for research and community use with strong prompt adherence and typography.

Stable Diffusion XL 1.0

SDXL 1.0 is Stability AI's July 2023 open-weights text-to-image diffusion model — a 2.6B-parameter U-Net with a refiner, widely used as the default open image generator.

Stable LM 2 1.6B

Stability AI's Stable LM 2 1.6B is a tiny multilingual open-weight LLM trained on 2T tokens, strong for its size with 4k context.

Stable Video Diffusion

Stable Video Diffusion is Stability AI's image-to-video latent diffusion model — generates short, coherent video clips from a single still image using a Stable Diffusion backbone.

Suno v3.5

Suno v3.5 is Suno AI's 2024 music-generation model — produces full songs with vocals, lyrics, and production up to four minutes from a single text prompt.

text-embedding-ada-002 (legacy)

text-embedding-ada-002 is OpenAI's 2022 text-embedding model — a 1536-dim dense embedder that became the de facto default for early RAG systems. Now superseded by text-embedding-3-small/large.

TinyLlama 1.1B

TinyLlama is an open community effort to pretrain a 1.1B-parameter Llama-architecture model on 3T tokens — a compact, hackable, edge-friendly LLM.

Udio v1.5

Udio v1.5 is Udio's music-generation model from the ex-DeepMind team — text-to-music with rich audio fidelity, long-form generation, and detailed lyric control.

Veo 3

Veo 3 is Google DeepMind's May 2025 text-to-video model — generates 4K-capable clips with synchronized dialogue, ambient audio, and cinematic camera motion via Vertex AI and Gemini.

Vertex AI textembedding-gecko

Vertex AI textembedding-gecko is Google Cloud's managed text-embedding endpoint — a Gemini-era English embedding model exposed through Vertex AI for enterprise RAG.

VILA 1.5 40B

NVIDIA's VILA 1.5 40B is an open-weight visual language model with multi-image and video support, strong on in-context learning for visual tasks.

Voyage AI voyage-3

Voyage AI voyage-3 is a retrieval-first embedding model family — voyage-3 and voyage-3-lite — built for RAG, with domain-specialised variants for code, law, and finance.

Yi-Large

01.AI's Yi-Large is Kai-Fu Lee's flagship Chinese/English LLM, a closed-model 2024 release optimised for reasoning, multilingual chat, and enterprise RAG.