Capability · Framework — fine-tuning

KServe

KServe is the CNCF model-serving project that grew out of Kubeflow's KFServing. It provides Kubernetes Custom Resources (InferenceService) to deploy models behind standard predict / explain / v2 inference APIs, integrating Knative for scale-to-zero, Istio for traffic splitting, and predictor runtimes for TF/PyTorch/XGBoost plus LLM runtimes (vLLM, TGI). It's the default choice for teams that want a K8s-native serving layer instead of a SaaS.

Framework facts

Category
fine-tuning
Language
Go / Python
License
Apache-2.0
Repository
https://github.com/kserve/kserve

Install

# Requires a K8s cluster with Knative + cert-manager
kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.13.0/kserve.yaml

Quickstart

# inference-service.yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: llama-vllm
spec:
  predictor:
    model:
      modelFormat:
        name: huggingface
      runtime: kserve-huggingfaceserver
      storageUri: s3://models/llama-3.1-8b

Alternatives

  • BentoML — simpler, Python-first
  • NVIDIA Triton — raw perf
  • Ray Serve

Frequently asked questions

Is KServe still called KFServing?

KFServing was renamed to KServe in 2021 when it spun out of Kubeflow; the old name still shows up in older docs and blog posts.

Does KServe serve LLMs natively?

Yes — since v0.11 there's a Hugging Face runtime and vLLM serving runtime, so you can deploy an LLM via a declarative InferenceService without writing serving code.

Sources

  1. KServe docs — accessed 2026-04-20
  2. KServe GitHub — accessed 2026-04-20