[AURON #1985] Optimize native metrics retrieval by passing keys directly#1982
[AURON #1985] Optimize native metrics retrieval by passing keys directly#1982lsm1 wants to merge 1 commit intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes native SQL metric registration by letting operators request only the specific native metric keys they need, instead of always creating the full default metric set and filtering afterward.
Changes:
- Refactors
NativeHelper.getDefaultNativeMetricsto accept an explicitSet[String]of metric keys and create only those metrics. - Updates native Spark plan operators to pass their required metric keys directly to
getDefaultNativeMetrics. - Updates native file scan metrics construction to use the new keyed API.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeWindowBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeUnionBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeTakeOrderedBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeSortMergeJoinBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeSortBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeShuffledHashJoinBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeProjectBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeParquetInsertIntoHiveTableBase.scala | Passes explicit metric key set to getDefaultNativeMetrics for native write path metrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeLocalLimitBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeGlobalLimitBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeGenerateBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeFilterBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeExpandBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeBroadcastJoinBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeBroadcastExchangeBase.scala | Switches from “all defaults” to an explicit full key set when registering native metrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeAggBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/ConvertToNativeBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/auron/NativeHelper.scala | Introduces keyed metric creation via defaultNativeMetricCreators; updates file scan metric assembly. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeShuffleExchangeExec.scala | Passes explicit spill/shuffle metric key set to getDefaultNativeMetrics for shuffle write metrics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| keys -- Set("input_batch_count", "input_row_count", "input_batch_mem_size") | ||
| } | ||
|
|
||
| TreeMap[String, SQLMetric]() ++ enabledKeys.flatMap { key => |
There was a problem hiding this comment.
enabledKeys is a Set, so enabledKeys.flatMap { ... } produces a Set[(String, SQLMetric)]. That forces hashing/equality on SQLMetric instances and allocates an intermediate Set before building the TreeMap. To keep the optimization goal, consider iterating (enabledKeys.iterator.flatMap(...)) and accumulating directly into the TreeMap (or using enabledKeys.iterator.map(...) + collect), which avoids building a Set of metrics first.
| TreeMap[String, SQLMetric]() ++ enabledKeys.flatMap { key => | |
| TreeMap[String, SQLMetric]() ++ enabledKeys.iterator.flatMap { key => |
| "input_batch_count" -> (sc => SQLMetrics.createMetric(sc, "Native.input_batches")), | ||
| "input_row_count" -> (sc => SQLMetrics.createMetric(sc, "Native.input_rows")), | ||
| "input_batch_mem_size" -> (sc => SQLMetrics.createSizeMetric(sc, "Native.input_mem_bytes"))) | ||
|
|
There was a problem hiding this comment.
Changing getDefaultNativeMetrics from a 1-arg method to a 2-arg method is a source/binary breaking change for any downstream code compiled against this module. If this object is part of a published API surface, consider keeping an overloaded getDefaultNativeMetrics(sc: SparkContext) (possibly deprecated) that delegates to the new implementation with the full default key set.
| def getDefaultNativeMetrics(sc: SparkContext): Map[String, SQLMetric] = { | |
| getDefaultNativeMetrics(sc, defaultNativeMetricCreators.keySet) | |
| } |
Which issue does this PR close?
Closes #1985
Rationale for this change
The previous native metrics retrieval fetched all default metrics for every operator. This changes allows specific operators to request only the relevant metric keys, reducing unnecessary overhead.
What changes are included in this PR?
Updated
getDefaultNativeMetricsto accept aSet[String]of metric keys instead of returning a fixed map.Are there any user-facing changes?
No
How was this patch tested?