Capability · Framework — fine-tuning
KServe
KServe is the CNCF model-serving project that grew out of Kubeflow's KFServing. It provides Kubernetes Custom Resources (InferenceService) to deploy models behind standard predict / explain / v2 inference APIs, integrating Knative for scale-to-zero, Istio for traffic splitting, and predictor runtimes for TF/PyTorch/XGBoost plus LLM runtimes (vLLM, TGI). It's the default choice for teams that want a K8s-native serving layer instead of a SaaS.
Framework facts
- Category
- fine-tuning
- Language
- Go / Python
- License
- Apache-2.0
- Repository
- https://github.com/kserve/kserve
Install
# Requires a K8s cluster with Knative + cert-manager
kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.13.0/kserve.yaml Quickstart
# inference-service.yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: llama-vllm
spec:
predictor:
model:
modelFormat:
name: huggingface
runtime: kserve-huggingfaceserver
storageUri: s3://models/llama-3.1-8b Alternatives
- BentoML — simpler, Python-first
- NVIDIA Triton — raw perf
- Ray Serve
Frequently asked questions
Is KServe still called KFServing?
KFServing was renamed to KServe in 2021 when it spun out of Kubeflow; the old name still shows up in older docs and blog posts.
Does KServe serve LLMs natively?
Yes — since v0.11 there's a Hugging Face runtime and vLLM serving runtime, so you can deploy an LLM via a declarative InferenceService without writing serving code.
Sources
- KServe docs — accessed 2026-04-20
- KServe GitHub — accessed 2026-04-20