Skip to content

Conversation

Copy link

Copilot AI commented Jan 30, 2026

Adds Kubernetes-native model serving via KServe with vLLM runtime, providing serverless inference, autoscaling, and canary deployment capabilities for Intel Xeon and Gaudi platforms.

Implementation

Helm Chart (core/helm-charts/kserve/)

  • InferenceService CRD template with vLLM runtime configuration
  • Platform-optimized values: xeon-values.yaml (pipeline parallelism), gaudi-values.yaml / gaudi3-values.yaml (tensor parallelism, bfloat16)
  • Resource templates: PVC, ConfigMap, Service, Ingress, ApisixRoute, ServiceMonitor
  • Security: read-only root filesystem, non-root user (1001), minimal capabilities (SYS_PTRACE/IPC_LOCK for Gaudi only)

Ansible Playbooks

  • deploy-kserve-operator.yml: Installs KServe v0.13.0 CRDs/controller, creates Intel-optimized ClusterServingRuntimes
  • deploy-kserve-models.yml: Helm-based model deployment with platform detection, HuggingFace token validation, proper cleanup with k8s_info polling

Configuration

  • inventory/metadata/vars/inference_kserve.yml: 60+ variables (platform selection, autoscaling, storage, model-specific args)
  • inventory/inference-config.cfg: Added deploy_kserve_operator, deploy_kserve_models, uninstall_kserve flags
  • Example configs in docs/examples/kserve/ for Xeon and Gaudi deployments

Integration Points

  • Observability: ServiceMonitor for Prometheus scraping
  • API Gateway: ApisixRoute resources for traffic routing
  • Existing workflow: Reuses helm_charts_base, remote_helm_charts_base from current playbook patterns

Usage

# Install operator
ansible-playbook -i inventory/hosts.yaml playbooks/deploy-kserve-operator.yml

# Deploy model on Xeon
cp docs/examples/kserve/kserve-xeon-config.yml inventory/metadata/vars/inference_kserve.yml
ansible-playbook -i inventory/hosts.yaml playbooks/deploy-kserve-models.yml

# Deploy model on Gaudi
cp docs/examples/kserve/kserve-gaudi-config.yml inventory/metadata/vars/inference_kserve.yml
ansible-playbook -i inventory/hosts.yaml playbooks/deploy-kserve-models.yml

Key Configurations

Xeon: 16-32 cores, 64-128GB RAM, AVX512 node selector, pipeline parallelism enabled
Gaudi: 1 accelerator, 128-256GB RAM, enforce-eager mode, bfloat16 precision
Gaudi3: 1 accelerator, 256-512GB RAM, 16K context length support

Documentation

  • docs/kserve-deployment-guide.md: Architecture, deployment steps, troubleshooting
  • docs/examples/kserve/QUICKSTART.md: 5-minute setup guide
  • core/helm-charts/kserve/README.md: Technical implementation details

Notes

  • Volumes (shm, tmp) always created regardless of PVC state to prevent runtime failures
  • Gaudi capabilities (SYS_PTRACE for profiler, IPC_LOCK for DMA) documented in values files
  • Uninstall tasks use k8s_info polling instead of fixed delays for reliable cleanup
  • Compatible with existing observability stack (Prometheus, Grafana) and APISIX gateway
Original prompt

Deep analyse the repo and suggest the changes to add support for kserve with vllm backend. Provide ansible playbook integration.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 4 commits January 30, 2026 10:21
Co-authored-by: amberjain1 <88656989+amberjain1@users.noreply.github.com>
…tion

Co-authored-by: amberjain1 <88656989+amberjain1@users.noreply.github.com>
Co-authored-by: amberjain1 <88656989+amberjain1@users.noreply.github.com>
Co-authored-by: amberjain1 <88656989+amberjain1@users.noreply.github.com>
Copilot AI changed the title [WIP] Add support for KServe with vLLM backend Add KServe support with vLLM backend and Ansible integration Jan 30, 2026
Copilot AI requested a review from amberjain1 January 30, 2026 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants