Feat/genai Add GenAI Cost Visibility Plugin with Token- and MIG-Aware Efficiency Metrics (Sidecar Architecture)#69
Open
nXtCyberNet wants to merge 6 commits intoopencost:mainfrom
Open
Conversation
Signed-off-by: Rohan Dev <rohantech2005@gmail.com>
Signed-off-by: Rohan Dev <rohantech2005@gmail.com>
Signed-off-by: Rohan Dev <rohantech2005@gmail.com>
Signed-off-by: Rohan Dev <rohantech2005@gmail.com>
Signed-off-by: Rohan Dev <rohantech2005@gmail.com>
Signed-off-by: Rohan Dev <rohantech2005@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces a GenAI Cost Visibility Plugin for OpenCost, implemented as a sidecar using the HashiCorp gRPC plugin system.
The plugin enriches OpenCost allocations with GenAI-specific efficiency signals by joining:
without modifying OpenCost’s core allocation logic.
The design is opt-in, stateless, vendor-neutral, and compatible with shared GPUs and MIG-based deployments.
Motivation / Problem
LLM and GenAI workloads consume expensive GPU resources, but are traditionally measured only as $/hour spend.
This prevents platform and FinOps teams from understanding efficiency, as cost is not linked to output.
Before this change, users could not answer:
This PR enables productivity-per-cost visibility for GenAI workloads.
High-Level Design
CustomCostProviderinterfaceArchitecture
Plugin Entry Point (
main.go)go-pluginConfiguration Layer (
genai-config.json,config.go)Protocol Layer (
custom_cost.proto)Provider / Interface Layer (
provider.go,interface.go)Efficiency Calculation Engine (
helpers.go)Pure, stateless computation layer:
Data Integration Layer (
join.go)MIG Support (
mig.go)Data Sources (
prom_source.go,scrape_source.go)Implemented GenAI Telemetry
Metrics
llm_tokens_emitted_totalllm_gpu_seconds_totalllm_cpu_seconds_total(optional)Attributes
workflow.phasegen_ai.model.namegen_ai.model.versiontenant.id/cost_centeraccelerator.typegpu.uuid/mig.uuidLimitations & Known Constraints
Unequal MIG Memory Partitioning
When MIG slices have unequal RAM partitioning, cost attribution may be inaccurate due to proportional normalization assumptions.
LLM Cache Handling
Cache efficiency is not implemented. Any cache-related logic is currently hard-coded / placeholder-only and not driven by telemetry.
Testing Status
End-to-end validation in a fully configured Kubernetes + OpenCost environment is pending due to local environment constraints.
Why This Approach
Scope
In Scope
Out of Scope
Future Work (Non-Blocking)
Related to opencost/opencost#3533