Add KServe support with vLLM backend and Ansible integration #48

Copilot · 2026-01-30T10:13:47Z

Adds Kubernetes-native model serving via KServe with vLLM runtime, providing serverless inference, autoscaling, and canary deployment capabilities for Intel Xeon and Gaudi platforms.

Implementation

Helm Chart (core/helm-charts/kserve/)

InferenceService CRD template with vLLM runtime configuration
Platform-optimized values: xeon-values.yaml (pipeline parallelism), gaudi-values.yaml / gaudi3-values.yaml (tensor parallelism, bfloat16)
Resource templates: PVC, ConfigMap, Service, Ingress, ApisixRoute, ServiceMonitor
Security: read-only root filesystem, non-root user (1001), minimal capabilities (SYS_PTRACE/IPC_LOCK for Gaudi only)

Ansible Playbooks

deploy-kserve-operator.yml: Installs KServe v0.13.0 CRDs/controller, creates Intel-optimized ClusterServingRuntimes
deploy-kserve-models.yml: Helm-based model deployment with platform detection, HuggingFace token validation, proper cleanup with k8s_info polling

Configuration

inventory/metadata/vars/inference_kserve.yml: 60+ variables (platform selection, autoscaling, storage, model-specific args)
inventory/inference-config.cfg: Added deploy_kserve_operator, deploy_kserve_models, uninstall_kserve flags
Example configs in docs/examples/kserve/ for Xeon and Gaudi deployments

Integration Points

Observability: ServiceMonitor for Prometheus scraping
API Gateway: ApisixRoute resources for traffic routing
Existing workflow: Reuses helm_charts_base, remote_helm_charts_base from current playbook patterns

Usage

# Install operator
ansible-playbook -i inventory/hosts.yaml playbooks/deploy-kserve-operator.yml

# Deploy model on Xeon
cp docs/examples/kserve/kserve-xeon-config.yml inventory/metadata/vars/inference_kserve.yml
ansible-playbook -i inventory/hosts.yaml playbooks/deploy-kserve-models.yml

# Deploy model on Gaudi
cp docs/examples/kserve/kserve-gaudi-config.yml inventory/metadata/vars/inference_kserve.yml
ansible-playbook -i inventory/hosts.yaml playbooks/deploy-kserve-models.yml

Key Configurations

Xeon: 16-32 cores, 64-128GB RAM, AVX512 node selector, pipeline parallelism enabled
Gaudi: 1 accelerator, 128-256GB RAM, enforce-eager mode, bfloat16 precision
Gaudi3: 1 accelerator, 256-512GB RAM, 16K context length support

Documentation

docs/kserve-deployment-guide.md: Architecture, deployment steps, troubleshooting
docs/examples/kserve/QUICKSTART.md: 5-minute setup guide
core/helm-charts/kserve/README.md: Technical implementation details

Notes

Volumes (shm, tmp) always created regardless of PVC state to prevent runtime failures
Gaudi capabilities (SYS_PTRACE for profiler, IPC_LOCK for DMA) documented in values files
Uninstall tasks use k8s_info polling instead of fixed delays for reliable cleanup
Compatible with existing observability stack (Prometheus, Grafana) and APISIX gateway

Original prompt

Deep analyse the repo and suggest the changes to add support for kserve with vllm backend. Provide ansible playbook integration.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: amberjain1 <88656989+amberjain1@users.noreply.github.com>

…tion Co-authored-by: amberjain1 <88656989+amberjain1@users.noreply.github.com>

Co-authored-by: amberjain1 <88656989+amberjain1@users.noreply.github.com>

Initial plan

e86eb07

Copilot AI assigned Copilot and amberjain1 Jan 30, 2026

Copilot started work on behalf of amberjain1 January 30, 2026 10:14 View session

Copilot AI and others added 4 commits January 30, 2026 10:21

Add KServe Helm chart and Ansible playbook integration

3a401c3

Co-authored-by: amberjain1 <88656989+amberjain1@users.noreply.github.com>

Address code review feedback - fix volumes, validation, and documenta…

da0dd23

…tion Co-authored-by: amberjain1 <88656989+amberjain1@users.noreply.github.com>

Add comprehensive documentation and quick start guide for KServe

adbd055

Co-authored-by: amberjain1 <88656989+amberjain1@users.noreply.github.com>

Add complete implementation summary and technical analysis

ccf207b

Co-authored-by: amberjain1 <88656989+amberjain1@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add support for KServe with vLLM backend~~ Add KServe support with vLLM backend and Ansible integration Jan 30, 2026

Copilot AI requested a review from amberjain1 January 30, 2026 10:33

Copilot finished work on behalf of amberjain1 January 30, 2026 10:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add KServe support with vLLM backend and Ansible integration #48

Add KServe support with vLLM backend and Ansible integration #48

Uh oh!

Copilot AI commented Jan 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add KServe support with vLLM backend and Ansible integration #48

Are you sure you want to change the base?

Add KServe support with vLLM backend and Ansible integration #48

Uh oh!

Conversation

Copilot AI commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation

Usage

Key Configurations

Documentation

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 30, 2026 •

edited

Loading