Capability

AI Frameworks & Tooling

From LangChain to DSPy — the libraries that turn model APIs into production systems.

162 entries · Sorted A→Z

Accelerate (Hugging Face)

Accelerate is Hugging Face's lightweight wrapper around PyTorch that makes the same training script run on CPU, single GPU, multi-GPU, TPU, DeepSpeed, and FSDP with minimal config changes.

Agno (formerly phidata) is a high-performance Python framework for building multi-agent systems with memory, knowledge, tools, and reasoning — model-agnostic and optimised for low-latency instantiation.

Aider

Aider is a terminal-first AI pair-programmer that edits your git repo — it reads selected files, generates diffs from your natural-language requests, and commits the changes, all from the CLI.

Anthropic SDK (Python)

The official Python SDK for Anthropic's Claude API, providing typed clients for messages, tool use, streaming, batch, files, prompt caching, and computer use.

Argilla

Argilla is Hugging Face's open-source data-annotation and feedback platform for LLMs — SFT, DPO, RLHF datasets, eval datasets, and continuous human review all in a single UI.

Arize Phoenix

Arize Phoenix is an open-source LLM observability and evaluation platform offering OpenTelemetry-compatible tracing, datasets, and experiments for AI applications.

AutoGen

AutoGen is Microsoft Research's open-source framework for building multi-agent conversational AI — asynchronous message-passing, layered APIs, and a visual AutoGen Studio for no-code agent design.

AutoGPT

AutoGPT is Significant Gravitas's autonomous agent platform that chains LLM reasoning with tools, memory, and file I/O to accomplish open-ended goals.

Axolotl

Axolotl is a config-driven fine-tuning framework for open-weight LLMs — write one YAML file describing dataset, model, and training hyperparameters, and Axolotl handles SFT, DPO, ORPO, LoRA, and full-parameter runs.

BabyAGI

BabyAGI is Yohei Nakajima's influential task-driven autonomous agent — a minimal Python loop that creates, prioritises, and executes tasks toward an objective.

BAML

BAML (Boundary's AI Modeling Language) is a schema-first DSL for defining typed LLM functions. You write function signatures in .baml files and BAML generates Python/TypeScript/Ruby clients with strict types, retries, and provider portability.

BentoML

BentoML is an open-source framework for packaging, serving, and deploying AI models — from classic ML to LLMs — with BentoCloud providing managed hosting and autoscaling on AWS/GCP.

BIG-Bench Hard (BBH)

BIG-Bench Hard is a curated 23-task subset of BIG-Bench where humans beat prior language models, widely used to measure chain-of-thought gains on LLMs.

BISHENG

BISHENG is an open-source LLM application platform from DataElem focused on enterprise document processing, workflows, and agents, popular in the Chinese market.

Braintrust

Braintrust is a commercial LLM evaluation platform that combines datasets, prompt playgrounds, automated scoring, and production observability — used by many US AI labs and startups to run systematic evals.

browser-use

browser-use is the most popular open-source Python library for giving LLM agents control of a real Chromium browser — DOM-aware clicks, typing, and screenshots driven by tools like OpenAI, Anthropic, or Gemini.

Burr

Burr is DAGWorks' open-source Python framework for building LLM applications as state machines, with built-in tracing, persistence, and a web UI for debugging.

CAMEL-AI

CAMEL is a pioneering open-source framework for multi-agent role-playing research, supporting a scalable society of agents for data generation and task solving.

Chonkie

Chonkie is a fast, lightweight Python chunking library for RAG, offering token, sentence, semantic, and late chunking strategies with a small dependency footprint.

Chroma

Chroma is the most popular embedded open-source vector database — pip-install, run in-process, and scale up to a self-hosted or managed Chroma Cloud deployment when needed.

Codeium

Codeium is the AI coding assistant and parent brand of the Windsurf IDE, offering autocomplete, chat, and agentic coding across 70+ IDEs — free for individuals, with enterprise self-hosted options.

Cody (Sourcegraph)

Cody is Sourcegraph's AI coding assistant with deep code-graph context, agentic editing, autocomplete, and repo-wide chat — available as a VS Code / JetBrains plugin and a CLI.

Colossal-AI

Colossal-AI is HPC-AI Tech's open-source distributed-training library for large models — heterogeneous memory management, tensor/pipeline parallelism, and a RLHF stack called Colossal-Chat.

ColPali

ColPali is a visual document retrieval model that indexes PDF pages as images using a vision-language model, eliminating traditional OCR-and-chunk pipelines.

Comet LLM / Opik-Comet

Comet's LLM offering (CometLLM and Opik) is an ML experiment tracking platform extended for LLM observability — prompt logging, evals, traces, and dashboards inside an existing Comet workspace.

Confident AI

Confident AI is the commercial cloud platform behind DeepEval — LLM evaluation, A/B testing, red-teaming, and continuous monitoring dashboards layered on top of the open-source DeepEval library.

Continue.dev

Continue is an open-source AI coding assistant for VS Code and JetBrains — chat, autocomplete, and agent modes that work with any model (Claude, GPT, local via Ollama) and a config-first approach to customisation.

Crawl4AI

Crawl4AI is an open-source async Python crawler built specifically for LLM pipelines — it ships JS rendering via Playwright, chunking, extraction strategies, and outputs Markdown or structured JSON.

CrewAI

CrewAI is a role-based multi-agent framework for Python where you define agents, tasks, and crews that collaborate to accomplish goals — focused on simplicity and opinionated orchestration.

ctransformers

ctransformers is a Python binding for GGML-based transformer models (Llama, GPT-2, Falcon, MPT) with a scikit-learn-style API and a LangChain integration — an older alternative to llama-cpp-python.

Datadog LLM Observability

Datadog LLM Observability is a managed product that correlates LLM traces, prompts, and evaluations with your existing infrastructure and APM monitoring.

DeepEval

DeepEval is an open-source Python framework for evaluating LLM applications — 40+ metrics (G-Eval, faithfulness, hallucination, toxicity, RAG-specific), pytest integration, and red-teaming for safety.

DeepSpeed

DeepSpeed is Microsoft Research's deep-learning optimisation library — ZeRO memory sharding, pipeline parallelism, mixed precision, and inference kernels that make training and serving trillion-parameter models tractable.

Dify

Dify is an open-source LLM application platform combining visual workflow building, RAG, agent tools, and backend hosting into a single BaaS-style product.

Distilabel

Distilabel is Argilla's open-source framework for generating and labelling synthetic data for LLM training — DAG-based pipelines, distillation, UltraFeedback, self-instruct, and DPO pair generation.

Docling

Docling is IBM Research's open-source document parser that converts PDFs, DOCX, HTML, and images into clean Markdown or JSON for LLM and RAG pipelines.

DSPy

DSPy is Stanford's framework for programming — not prompting — LLMs. You declare modules and signatures in Python, and DSPy optimises the prompts and few-shot examples against your metric.

DVC for LLM Pipelines

DVC (Data Version Control) is Iterative's Git-based tool for versioning datasets, models, and LLM pipelines — reproducible experiments, lineage, and remote storage for fine-tuning and evals.

EleutherAI lm-evaluation-harness

lm-evaluation-harness is EleutherAI's de-facto standard framework for evaluating language models across 200+ benchmarks (MMLU, GSM8K, HellaSwag, ARC, TruthfulQA) with reproducible configs.

ell

ell is a lightweight Python library that treats prompts as versioned pure functions — decorator-based prompt definitions, auto-versioning, and a local studio for inspecting every invocation as a first-class artefact.

Firecrawl

Firecrawl is an open-source and hosted service that crawls websites and returns clean Markdown or structured JSON — purpose-built for feeding LLM pipelines with renderable, up-to-date web content.

Fireworks AI SDK

Fireworks AI is a fast hosted inference service for open-source models, with an OpenAI-compatible SDK, LoRA hot-swapping, and custom fine-tuning — optimised for latency and cost.

Flowise

Flowise is an open-source drag-and-drop UI for building LangChain-based LLM flows, chatbots, and agents — deployable as hosted API with a visual canvas.

Galileo

Galileo is an enterprise GenAI observability and evaluation platform — LLM-as-judge metrics, guardrail policies, and production-grade drift detection aimed at regulated industries shipping real-money AI.

Genkit

Genkit is Google's open-source framework for building production GenAI apps, with SDKs in JavaScript/TypeScript, Go, and Python, tightly integrated with Firebase and Vertex AI.

Giskard

Giskard is an open-source testing framework for ML and LLM applications — detects biases, hallucinations, injection vulnerabilities, and data drift with an automated scan that generates test suites and CI checks.

GitHub Copilot CLI

GitHub Copilot CLI is a terminal-native AI assistant that explains, suggests, and runs shell commands with confirmation — part of the wider GitHub Copilot product line.

GPT-Engineer

GPT-Engineer is an open-source CLI agent by Anton Osika that generates and iteratively improves entire codebases from a natural-language prompt.

gptme

gptme is a terminal-based personal AI assistant that can execute shell commands, edit files, run Python, and browse the web — a minimal, local-first alternative to Aider and Open Interpreter with broad LLM support.

Griptape

Griptape is a Python framework for building AI agents and pipelines with a Structures API (Agents, Pipelines, Workflows), first-class RAG, and an opinionated off-prompt data approach that keeps sensitive data out of LLM context.

Guidance

Guidance is Microsoft's structured generation library for controlling LLM output with interleaved prompts, constraints, and regex/CFG guidance — originally designed to work with local models where token-level logit access is possible.

Haystack

Haystack is deepset's open-source Python framework for building production LLM applications — composable pipelines for RAG, agents, and document processing with strong typing and evaluation.

Haystack Agents

Haystack Agents is deepset's agentic module inside Haystack 2.x — tool-using LLM agents that plug into Haystack's pipeline graph for RAG, search, and production enterprise workflows.

Helicone

Helicone is an open-source LLM observability platform and gateway that captures every request, logs prompts and responses, computes costs, and surfaces performance issues — deployable as SaaS or self-hosted.

Hugging Face Inference Endpoints

Hugging Face Inference Endpoints is a managed service that deploys any Hub model as a secure, autoscaled HTTPS endpoint on AWS, Azure, or GCP — with TGI for LLMs and Inference Toolkit for the long tail.

HumanEval+ / EvalPlus

EvalPlus is a rigorously extended version of OpenAI's HumanEval and MBPP code-generation benchmarks, with 80x more test cases that catch silent failures — the reference benchmark for code LLMs.

Humanloop

Humanloop is a hosted LLM engineering platform offering prompt management, evaluations, datasets, and observability for production AI applications.

Inspect AI

Inspect AI is the UK AI Safety Institute's open-source evaluation framework, designed for large-scale AI safety and capability benchmarks — dataset-driven, with scorers, solvers, and tool-use evals.

Instructor

Instructor is the most popular Python library for getting structured, validated outputs from LLMs — patches OpenAI-compatible clients to return Pydantic models directly, with retries and partial streaming.

Jan

Jan is an open-source ChatGPT-alternative desktop app that runs local LLMs offline on Windows, macOS, and Linux, with an OpenAI-compatible API server, model hub, and extensions.

Jina Reader

Jina Reader is a free public API that converts any URL into clean LLM-ready Markdown — just prepend r.jina.ai — plus a self-host option for private data and higher rate limits.

KServe

KServe is a Kubernetes-native model serving platform — originally KFServing — that provides standard CRDs for deploying ML and LLM models with autoscaling, canary rollouts, and GPU support.

Laminar

Laminar is an open-source LLM observability, evals, and prompt-management platform written in Rust, with a self-hostable stack (Postgres, Clickhouse) and a cloud option.

LanceDB

LanceDB is an embedded serverless vector database built on the Lance columnar format — zero-server, S3-native, and optimised for multimodal AI workloads with Rust, Python, and TypeScript SDKs.

LangChain

LangChain is the dominant Python/TypeScript framework for building LLM applications — chains, agents, tool use, memory, and observability via LangSmith and deployment via LangGraph.

LangChain Hub

LangChain Hub is a shared registry for prompts, runnables, and reference agents, letting teams version and pull reusable LangChain artefacts via the LangSmith UI.

Langflow

Langflow is an open-source Python-based visual IDE for designing LLM workflows, RAG pipelines, and agents, built on top of LangChain and now maintained by DataStax.

Langfuse

Langfuse is the leading open-source observability, tracing, prompt management, and evaluation platform for LLM apps — self-hostable, OTel-compatible, and framework-agnostic.

LangGraph

LangGraph is LangChain's stateful agent framework — a low-level library for building controllable, long-running LLM agents as graphs with checkpoints, human-in-the-loop, and durable execution.

LangSmith

LangSmith is LangChain's commercial observability, evaluation, and prompt-management platform for LLM apps — traces, datasets, online/offline evals, and prompt versioning in one tool.

Langtrace

Langtrace is an open-source OpenTelemetry-native observability platform for LLM apps with SDKs for Python and TypeScript, plus a self-hostable UI and cloud option from Scale3.

LaVague

LaVague is an open-source large-action-model framework that turns natural-language instructions into Selenium/Playwright code — combining a world model, an action engine, and retrieval over DOM snippets.

Letta

Letta (formerly MemGPT) is an open-source framework and server for building stateful agents with long-term memory, self-editing context, and persistent state — based on Berkeley's MemGPT research.

Liger Kernel

Liger Kernel is LinkedIn's open-source collection of fused Triton kernels for LLM training — RMSNorm, RoPE, SwiGLU, CrossEntropy fused with speedups of 20-30% and 50%+ memory savings.

Lilypad (Mirascope Labs)

Lilypad is an open-source prompt-engineering and LLM observability toolkit from Mirascope Labs, offering versioned prompt experiments, traces, and evals.

LiteLLM

LiteLLM is an open-source Python SDK and proxy that normalises 100+ LLM providers (OpenAI, Anthropic, Azure, Bedrock, Vertex, Ollama) behind a single OpenAI-compatible API with cost tracking, fallbacks, and retries.

LitGPT

LitGPT is Lightning AI's hackable implementation of 20+ LLM architectures — pretraining, fine-tuning, LoRA, QLoRA, and serving, all in readable PyTorch without wrappers on top of wrappers.

Llama Stack

Llama Stack is Meta's standardised API surface for building LLM apps — inference, safety, memory, agents, and evals behind one vendor-agnostic spec with Python/Node SDKs.

llama-cpp-python

llama-cpp-python is the official Python binding for llama.cpp, exposing local GGUF inference with an OpenAI-compatible server, LangChain integration, and CPU/GPU acceleration.

llama.cpp

llama.cpp is a C/C++ inference engine for LLMs that runs Llama, Mistral, Qwen, Gemma, Phi and hundreds of other open-weight models on laptops, servers, and edge devices — no Python or CUDA required.

llamafile

llamafile is Mozilla's project that packages an LLM, its weights, and llama.cpp into a single executable that runs on Linux, macOS, Windows, and BSD with no install — a fully portable local model.

LlamaIndex

LlamaIndex is the Python/TypeScript framework for building RAG and retrieval pipelines over your data — 160+ loaders, query engines, and a commercial Llama Cloud for hosted ingestion.

LlamaParse

LlamaParse is LlamaIndex's hosted document parser specialised for LLM ingestion — turning complex PDFs, slides, and tables into clean, structured Markdown.

LLM Guard

LLM Guard is a security-focused open-source toolkit from Protect AI — input and output scanners for prompt injection, PII, toxicity, bias, and secret leakage that drop in front of any LLM API.

LM Studio

LM Studio is a polished desktop app for discovering, downloading, and running local LLMs on Windows, macOS, and Linux, with an OpenAI-compatible local server and a headless CLI for production.

Log10

Log10 is an LLM observability and evaluation platform with automated log feedback, self-hosted deployment options, and debugging tools for production agents.

LoRAX

LoRAX is Predibase's open-source LLM server specialised in hot-swapping hundreds of LoRA adapters on a single base model for low-cost multi-tenant inference.

Marker

Marker is an open-source PDF-to-Markdown converter from Datalab that preserves layout, tables, code, and equations — widely used as the first stage of RAG pipelines and document ingestion.

Marvin

Marvin is a lightweight Python library from Prefect for building AI features using type hints — classify, extract, transform, or generate with decorators over Pydantic models and native function signatures.

Mastra

Mastra is a TypeScript-first agent framework from the Gatsby founders — agents, workflows, RAG, memory, evals, and observability, designed to run on Node and edge runtimes.

Megatron-LM

Megatron-LM is NVIDIA's research framework for training very large transformer models — pioneered tensor and pipeline parallelism and provides the reference kernels used across the industry.

Meilisearch

Meilisearch is an open-source, developer-friendly search engine written in Rust with instant typo-tolerant BM25 search, hybrid vector+keyword retrieval, and a simple REST API — a common RAG companion to LLM stacks.

MetaGPT

MetaGPT is a multi-agent framework that assigns software-engineering roles (PM, architect, engineer, QA) to specialised LLM agents to collaboratively build projects.

Microsoft Presidio

Presidio is Microsoft's open-source PII detection and anonymisation framework — spaCy + regex + pattern recognisers that identify and redact personal data in text, images, and structured data before it hits an LLM.

Microsoft PromptFlow

PromptFlow is Microsoft's open-source toolkit for building, evaluating, and deploying LLM applications, integrated with Azure AI Foundry for production pipelines and tracing.

Milvus

Milvus is a graduated CNCF open-source vector database engineered for billion-scale similarity search — distributed architecture, GPU indexing, hybrid dense+sparse retrieval, and a mature managed offering via Zilliz Cloud.

Mirascope

Mirascope is a developer-friendly Python toolkit for LLMs — Pythonic prompt templates via decorators, typed outputs with Pydantic, and first-class support for every major provider with a thin, composable API.

MLC LLM

MLC LLM is a universal LLM deployment engine that compiles models to run efficiently on phones, browsers (WebGPU), Macs, and any GPU — enabling client-side inference without a server.

MLflow LLM Evaluate

MLflow's LLM evaluation module adds mlflow.evaluate() support for language-model outputs — built-in metrics like toxicity, ROUGE, faithfulness, and custom GenAI judges logged alongside regular ML experiments.

Modal

Modal is a serverless cloud for AI and data workloads — Python-first, GPU-ready, with zero-config containers, scheduled jobs, web endpoints, and a developer experience that feels closer to importing a decorator than deploying infra.

Modular MAX Platform

MAX is Modular's unified AI platform — a high-performance serving engine and Mojo-based development stack designed to outperform TensorRT-LLM and vLLM on common hardware.

NVIDIA NeMo Guardrails

NeMo Guardrails is NVIDIA's open-source toolkit for adding programmable rails around LLM apps — topical, dialog, moderation, and retrieval guardrails written in the Colang DSL.

NVIDIA Triton Inference Server

Triton is NVIDIA's open-source inference server supporting PyTorch, TensorFlow, ONNX, TensorRT, and TensorRT-LLM backends for high-throughput model serving.

Ollama

Ollama is the most popular local-first runtime for open-weight LLMs — a single binary that downloads, quantises, and serves models like Llama, Qwen, Mistral, Gemma, and Phi over an OpenAI-compatible API.

olmOCR

olmOCR is AllenAI's open-source OCR toolkit that converts PDFs and scans to clean linearised text using a vision-language model fine-tuned on millions of pages — tuned for trillion-token pretraining corpora.

Open Interpreter

Open Interpreter is a natural-language interface to your computer — it writes and executes Python, Bash, JavaScript, or AppleScript locally so an LLM can edit files, query APIs, or drive native apps from a single terminal REPL.

OpenAI Agents SDK

The OpenAI Agents SDK is OpenAI's official 2025 framework for building agentic apps with handoffs, guardrails, sessions, and tracing — a production-ready successor to the earlier Swarm experiment.

OpenAI Evals

OpenAI Evals is OpenAI's open-source framework for building and running LLM evaluations, plus a registry of crowd-contributed benchmarks covering many tasks.

OpenAI SDK (Python)

The official Python SDK for the OpenAI API, covering Chat Completions, Responses, Assistants, Realtime, Files, Fine-tuning, Embeddings, Images, and Audio.

OpenCompass

OpenCompass is Shanghai AI Lab's comprehensive LLM evaluation platform supporting 100+ benchmarks and 20+ model families, widely used in the Chinese AI community.

OpenLLM

OpenLLM by BentoML is an open platform for running and deploying open-source LLMs as OpenAI-compatible APIs, with one-command serving and built-in bento packaging.

OpenLLMetry (Traceloop)

OpenLLMetry is Traceloop's open-source OpenTelemetry extension that adds standardized LLM spans to your existing tracing stack — one library, any OTLP backend.

OpenRouter

OpenRouter is a hosted AI router that gives you a single OpenAI-compatible endpoint plus one billing account for 300+ models across Anthropic, OpenAI, Google, Meta, Mistral, DeepSeek, and open-source providers.

Opik

Opik is Comet's open-source LLM observability and evaluation platform — trace logging, prompt playground, LLM-as-judge evals, and a hosted tier that plugs into LangChain, LlamaIndex, and OpenAI.

Outlines

Outlines is a Python library for structured text generation — it constrains an LLM's output to match a JSON schema, regex, context-free grammar, or Pydantic model at the decoding step, guaranteeing valid structure.

Patronus AI

Patronus AI is an evaluation and guardrail platform for LLM applications with a library of judge models (Lynx for hallucination detection), scenario testing, and regulated-industry benchmarks.

pdfplumber

pdfplumber is a Python library for extracting text, tables, and layout metadata from PDFs, built on pdfminer.six — the go-to tool when you need per-character precision and reliable table extraction.

PEFT (Hugging Face)

PEFT is Hugging Face's library of parameter-efficient fine-tuning methods — LoRA, QLoRA, IA3, prefix tuning, and more — implemented as wrappers on top of Transformers and Accelerate.

pgvector

pgvector is the de-facto vector similarity extension for Postgres — IVFFlat and HNSW indexes, exact and approximate search, and full SQL joins against your existing tables, no separate database required.

Phind

Phind is an AI search engine and coding assistant for developers that grounds answers in live web results and documentation, with a VS Code extension and a line of fine-tuned open coding models.

Pinecone

Pinecone is the market-leading managed vector database for production AI — serverless pay-per-use architecture, billions-scale indexes, hybrid search, and native integrations with every major LLM stack.

Portkey

Portkey is an AI gateway that sits between your app and LLM providers, adding semantic caching, retries, load balancing, guardrails, cost limits, and prompt management across 200+ models.

PromptBench

PromptBench is a Microsoft unified Python library for evaluating LLMs across benchmarks, adversarial prompts, prompt engineering, and dynamic evaluation protocols.

Promptfoo

Promptfoo is an open-source CLI and library for testing, evaluating, and red-teaming LLM prompts — YAML-first configs, matrix sweeps across providers, and a web viewer for side-by-side diffs.

Pydantic AI

Pydantic AI is a typed, Pythonic agent framework from the Pydantic team that brings FastAPI-style ergonomics to building production LLM apps with structured outputs, dependency injection, and built-in evals.

Qdrant

Qdrant is a high-performance open-source vector database written in Rust — rich payload filtering, hybrid dense/sparse search, quantisation, and a managed Qdrant Cloud offering.

R2R

R2R (Reason to Retrieve) is SciPhi's open-source RAG server — ingestion, hybrid search, knowledge graphs, agentic retrieval, and multi-tenant auth in a single deployable service.

Ragas

Ragas is the standard open-source evaluation framework for RAG and agentic LLM applications — metrics for faithfulness, answer relevancy, context precision/recall, and agent tool use.

RAGatouille

RAGatouille is a Python library that makes ColBERT-style late-interaction retrieval practical for RAG pipelines — index, search, and fine-tune ColBERT models with a few lines, often beating single-vector dense retrieval.

Ray Serve LLM

Ray Serve LLM is Anyscale's batteries-included module for serving LLMs on Ray clusters, bundling vLLM, Ray autoscaling, and an OpenAI-compatible API.

Reducto

Reducto is a document-AI API that parses complex PDFs, spreadsheets, and scans into structured JSON or Markdown with layout-aware chunking — built for enterprise RAG on financial filings and contracts.

Replicate

Replicate is a pay-per-second inference cloud for open-source ML models — one HTTP call to run Flux, Llama, Whisper, or any custom model pushed via the Cog container format.

Requesty

Requesty is an AI request router and gateway that gives developers one API key for hundreds of models, with smart routing based on cost, latency, or quality plus usage analytics and fallback handling.

Rivet

Rivet is Ironclad's open-source desktop IDE for visually designing, debugging, and executing LLM agent graphs with a focus on local development ergonomics.

Semantic Kernel

Semantic Kernel is Microsoft's open-source SDK for orchestrating LLMs, plugins, and memory in C#, Python, and Java — the enterprise-friendly alternative to LangChain with first-class Azure OpenAI support.

SGLang

SGLang is a high-performance LLM serving framework with a structured-generation front-end and a RadixAttention backend that accelerates prompts with shared prefixes, often outperforming vLLM on structured workloads.

Skyvern

Skyvern is an open-source self-hostable browser automation platform that uses LLMs plus computer vision to complete web tasks — form filling, data scraping, and multi-step flows — without brittle XPath selectors.

smolagents

smolagents is Hugging Face's minimal agent framework (~1000 LOC) focused on code-writing agents — LLMs that plan by generating Python rather than JSON tool calls.

Stanford HELM

HELM (Holistic Evaluation of Language Models) is Stanford CRFM's reproducible benchmark suite covering accuracy, calibration, robustness, bias, toxicity, and efficiency.

Tabnine

Tabnine is an enterprise-focused AI coding assistant with on-prem deployment, custom-model fine-tuning on private repos, and strong data-governance controls — used in regulated industries and large engineering orgs.

Tantivy

Tantivy is a fast, full-text search engine library written in Rust — a Lucene-inspired foundation for building custom BM25 hybrid search in RAG stacks with Python bindings (tantivy-py).

TaskWeaver

TaskWeaver is Microsoft's code-first agent framework that converts user requests into executable Python plans, designed for data analytics and rich plugin ecosystems.

TensorRT-LLM

NVIDIA TensorRT-LLM is a C++/Python library that compiles LLMs into highly-optimised CUDA engines for H100/H200/B200 GPUs, delivering the highest raw throughput of any inference stack on NVIDIA hardware.

Text Generation Inference (TGI)

Text Generation Inference is Hugging Face's high-performance inference server for serving open-source LLMs with continuous batching, tensor parallelism, and quantisation.

Together AI SDK

Together AI's Python and TypeScript SDKs give an OpenAI-compatible interface to 200+ open-source models (Llama, Mixtral, DeepSeek, Qwen) served on Together's low-latency GPU cloud.

Together Fine-Tuning

Together AI's managed fine-tuning service runs SFT, DPO, and continued-pretraining jobs on open-weight models (Llama, Mistral, Qwen, DeepSeek) via a hosted API, returning a deployable endpoint.

torchtune

torchtune is PyTorch's official native fine-tuning library for LLMs — recipe-driven SFT, LoRA, QLoRA, DPO, and distributed training without the Hugging Face Transformers abstraction layer.

Trafilatura

Trafilatura is a widely-used Python library for extracting main content, metadata, and comments from HTML — fast, purely local, and consistently ranked top on web-extraction benchmarks.

TRL (Transformer Reinforcement Learning)

TRL is Hugging Face's official library for post-training LLMs — supervised fine-tuning, PPO, DPO, ORPO, KTO, GRPO, and reward-model training, all built on Transformers and Accelerate.

TruLens

TruLens is Snowflake's open-source LLM-observability and evaluation library — feedback functions for groundedness, relevance, and toxicity plus a local dashboard that traces every RAG call.

turbopuffer

turbopuffer is a serverless object-storage-native vector database — cold-start friendly, pay-per-query, and designed to hold billions of vectors at a fraction of memory-resident DB cost while still delivering sub-second ANN search.

txtai

txtai is an all-in-one embeddings database and AI toolkit for Python — vector search, RAG pipelines, agents, and language model workflows in a single lightweight package.

TypeChat

TypeChat is a Microsoft library that uses TypeScript types as the schema for LLM outputs, yielding strongly-typed, validated JSON responses without a heavy orchestration layer.

Unsloth

Unsloth is a Python library that fine-tunes open-source LLMs (Llama, Mistral, Qwen, Gemma, Phi) 2-5x faster than HuggingFace defaults with 60-80% less memory, using custom Triton kernels and manual backprop.

Unstructured.io

Unstructured is an open-source toolkit that extracts, cleans, and chunks content from PDFs, HTML, emails, and office docs into LLM-ready structured elements.

Verba

Verba is Weaviate's open-source RAG chatbot — a ready-to-deploy golden-path example for ingesting documents, indexing into Weaviate, and chatting with your data via a polished web UI.

Vercel AI SDK

The Vercel AI SDK is a TypeScript library for building AI-powered apps — unified generation API across OpenAI, Anthropic, Google, and 20+ providers, streaming React UI helpers, and agent / tool-use primitives.

Vespa

Vespa is Yahoo's open-source search and retrieval engine — tensor ranking, late-interaction ColBERT, vector ANN, and structured query evaluation in one distributed platform used for web-scale AI search.

vLLM

vLLM is the leading open-source high-throughput inference and serving engine for LLMs — PagedAttention, continuous batching, prefix caching, tensor/pipeline parallelism, and OpenAI-compatible API.

W&B Weave

Weights & Biases Weave is a toolkit for tracking, evaluating, and iterating on LLM applications with automatic call tracing, datasets, and rigorous evaluation.

Weaviate

Weaviate is an open-source vector database with native hybrid search, generative modules, multi-tenancy, and a strong ecosystem of first-party apps like Verba for RAG chatbots.

Zed AI

Zed AI is the built-in AI assistant panel and agentic editing system inside the Zed editor — a high-performance Rust IDE with multi-model chat, inline edits, and an agentic Zed Edit mode.