Feat/genai Add GenAI Cost Visibility Plugin with Token- and MIG-Aware Efficiency Metrics (Sidecar Architecture) by nXtCyberNet · Pull Request #69 · opencost/opencost-plugins

nXtCyberNet · 2026-01-29T13:15:41Z

This PR introduces a GenAI Cost Visibility Plugin for OpenCost, implemented as a sidecar using the HashiCorp gRPC plugin system.

The plugin enriches OpenCost allocations with GenAI-specific efficiency signals by joining:

existing OpenCost cost allocations (CPU / RAM / GPU), and
external GenAI telemetry (tokens, GPU active time, utilization),

without modifying OpenCost’s core allocation logic.

The design is opt-in, stateless, vendor-neutral, and compatible with shared GPUs and MIG-based deployments.

Motivation / Problem

LLM and GenAI workloads consume expensive GPU resources, but are traditionally measured only as $/hour spend.

This prevents platform and FinOps teams from understanding efficiency, as cost is not linked to output.

Before this change, users could not answer:

How many tokens/sec are generated per GPU or pod?
What is the cost per token for inference vs training?
Which model or phase delivers the best efficiency per dollar?
Where GPU spend is driven by real demand vs idle reservation?

This PR enables productivity-per-cost visibility for GenAI workloads.

High-Level Design

Implemented as a standalone plugin process
Communicates with OpenCost over gRPC via Unix Domain Socket
Uses OpenCost CustomCostProvider interface
All GenAI logic isolated from OpenCost core
All efficiency metrics derived at query time

Architecture

Plugin Entry Point (`main.go`)

Plugin bootstrap and gRPC server setup
Configuration loading
Secure handshake using magic cookies
Service registration with HashiCorp go-plugin

Configuration Layer (`genai-config.json`, `config.go`)

Prometheus connection configuration
Metric name customization and overrides
Support for vLLM and custom GenAI runtimes
Annotation-based overrides with safe fallbacks

Protocol Layer (`custom_cost.proto`)

gRPC contract definition
Strongly typed request/response models

Provider / Interface Layer (`provider.go`, `interface.go`)

Plugin framework integration
gRPC client/server wrappers
Data transformation between protobuf and internal models

Efficiency Calculation Engine (`helpers.go`)

Pure, stateless computation layer:

Token cost normalization (per 1M tokens)
Tokens per GPU-second
GPU utilization and waste calculation
Efficiency classification (Underutilized / Optimal / Saturated)
MIG-aware normalization

Data Integration Layer (`join.go`)

Pod → node → GPU correlation
GenAI metadata extraction from annotations
MIG capacity mapping
Main workload aggregation logic

MIG Support (`mig.go`)

GPU slice utilization tracking
Capacity-aware cost allocation
Node-level GPU waste aggregation

Data Sources (`prom_source.go`, `scrape_source.go`)

Prometheus-based telemetry ingestion
Direct metrics scraping fallback

Implemented GenAI Telemetry

Metrics

llm_tokens_emitted_total
llm_gpu_seconds_total
llm_cpu_seconds_total (optional)

Attributes

workflow.phase
gen_ai.model.name
gen_ai.model.version
tenant.id / cost_center
accelerator.type
gpu.uuid / mig.uuid
Kubernetes workload context

Limitations & Known Constraints

Unequal MIG Memory Partitioning

When MIG slices have unequal RAM partitioning, cost attribution may be inaccurate due to proportional normalization assumptions.

LLM Cache Handling

Cache efficiency is not implemented. Any cache-related logic is currently hard-coded / placeholder-only and not driven by telemetry.

Testing Status

End-to-end validation in a fully configured Kubernetes + OpenCost environment is pending due to local environment constraints.

Why This Approach

Keeps GenAI logic isolated and opt-in
Avoids changes to OpenCost core
Works with shared GPUs and MIG
Treats telemetry as best-effort
Derives metrics at query time

Scope

In Scope

Token-aware cost attribution
GPU and MIG efficiency metrics
Sidecar plugin architecture

Out of Scope

Cache savings attribution
Persistent metric storage

Future Work (Non-Blocking)

Memory-weighted MIG normalization
Standardized OTEL GenAI metrics
Cache efficiency once portable telemetry exists

Related to opencost/opencost#3533

Signed-off-by: Rohan Dev <rohantech2005@gmail.com>

nXtCyberNet added 6 commits January 28, 2026 19:07

feat(genai): add internal logic for prometheus metrics

45cf8a2

Signed-off-by: Rohan Dev <rohantech2005@gmail.com>

feat(genai): add core prom scrape and cost join logic

e3bba65

Signed-off-by: Rohan Dev <rohantech2005@gmail.com>

feat(genai): add gRPC provider and generated bindings

4b6de15

Signed-off-by: Rohan Dev <rohantech2005@gmail.com>

feat(genai): add plugin wiring and configuration

6a56d5a

Signed-off-by: Rohan Dev <rohantech2005@gmail.com>

feat(genai): updated the dockerfile for it

44f600a

Signed-off-by: Rohan Dev <rohantech2005@gmail.com>

feat(genai): updated the dockerfile for it

c2de4fa

Signed-off-by: Rohan Dev <rohantech2005@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/genai Add GenAI Cost Visibility Plugin with Token- and MIG-Aware Efficiency Metrics (Sidecar Architecture)#69

Feat/genai Add GenAI Cost Visibility Plugin with Token- and MIG-Aware Efficiency Metrics (Sidecar Architecture)#69
nXtCyberNet wants to merge 6 commits intoopencost:mainfrom
nXtCyberNet:feat/genai

nXtCyberNet commented Jan 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nXtCyberNet commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation / Problem

High-Level Design

Architecture

Plugin Entry Point (main.go)

Configuration Layer (genai-config.json, config.go)

Protocol Layer (custom_cost.proto)

Provider / Interface Layer (provider.go, interface.go)

Efficiency Calculation Engine (helpers.go)

Data Integration Layer (join.go)

MIG Support (mig.go)

Data Sources (prom_source.go, scrape_source.go)

Implemented GenAI Telemetry

Metrics

Attributes

Limitations & Known Constraints

Unequal MIG Memory Partitioning

LLM Cache Handling

Testing Status

Why This Approach

Scope

Future Work (Non-Blocking)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nXtCyberNet commented Jan 29, 2026 •

edited

Loading

Plugin Entry Point (`main.go`)

Configuration Layer (`genai-config.json`, `config.go`)

Protocol Layer (`custom_cost.proto`)

Provider / Interface Layer (`provider.go`, `interface.go`)

Efficiency Calculation Engine (`helpers.go`)

Data Integration Layer (`join.go`)

MIG Support (`mig.go`)

Data Sources (`prom_source.go`, `scrape_source.go`)