Skip to content

[AURON #1985] Optimize native metrics retrieval by passing keys directly#1982

Open
lsm1 wants to merge 1 commit intoapache:masterfrom
lsm1:features/optimize-metrics
Open

[AURON #1985] Optimize native metrics retrieval by passing keys directly#1982
lsm1 wants to merge 1 commit intoapache:masterfrom
lsm1:features/optimize-metrics

Conversation

@lsm1
Copy link

@lsm1 lsm1 commented Feb 3, 2026

Which issue does this PR close?

Closes #1985

Rationale for this change

The previous native metrics retrieval fetched all default metrics for every operator. This changes allows specific operators to request only the relevant metric keys, reducing unnecessary overhead.

What changes are included in this PR?

Updated getDefaultNativeMetrics to accept a Set[String] of metric keys instead of returning a fixed map.

Are there any user-facing changes?

No

How was this patch tested?

@github-actions github-actions bot added the spark label Feb 3, 2026
@cxzl25 cxzl25 requested a review from Copilot February 4, 2026 07:12
@cxzl25 cxzl25 changed the title Optimize native metrics retrieval by passing keys directly [AURON #1985] Optimize native metrics retrieval by passing keys directly Feb 4, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes native SQL metric registration by letting operators request only the specific native metric keys they need, instead of always creating the full default metric set and filtering afterward.

Changes:

  • Refactors NativeHelper.getDefaultNativeMetrics to accept an explicit Set[String] of metric keys and create only those metrics.
  • Updates native Spark plan operators to pass their required metric keys directly to getDefaultNativeMetrics.
  • Updates native file scan metrics construction to use the new keyed API.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeWindowBase.scala Passes explicit metric key set to getDefaultNativeMetrics.
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeUnionBase.scala Passes explicit metric key set to getDefaultNativeMetrics.
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeTakeOrderedBase.scala Passes explicit metric key set to getDefaultNativeMetrics.
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeSortMergeJoinBase.scala Passes explicit metric key set to getDefaultNativeMetrics.
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeSortBase.scala Passes explicit metric key set to getDefaultNativeMetrics.
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeShuffledHashJoinBase.scala Passes explicit metric key set to getDefaultNativeMetrics.
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeProjectBase.scala Passes explicit metric key set to getDefaultNativeMetrics.
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeParquetInsertIntoHiveTableBase.scala Passes explicit metric key set to getDefaultNativeMetrics for native write path metrics.
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeLocalLimitBase.scala Passes explicit metric key set to getDefaultNativeMetrics.
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeGlobalLimitBase.scala Passes explicit metric key set to getDefaultNativeMetrics.
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeGenerateBase.scala Passes explicit metric key set to getDefaultNativeMetrics.
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeFilterBase.scala Passes explicit metric key set to getDefaultNativeMetrics.
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeExpandBase.scala Passes explicit metric key set to getDefaultNativeMetrics.
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeBroadcastJoinBase.scala Passes explicit metric key set to getDefaultNativeMetrics.
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeBroadcastExchangeBase.scala Switches from “all defaults” to an explicit full key set when registering native metrics.
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeAggBase.scala Passes explicit metric key set to getDefaultNativeMetrics.
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/ConvertToNativeBase.scala Passes explicit metric key set to getDefaultNativeMetrics.
spark-extension/src/main/scala/org/apache/spark/sql/auron/NativeHelper.scala Introduces keyed metric creation via defaultNativeMetricCreators; updates file scan metric assembly.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeShuffleExchangeExec.scala Passes explicit spill/shuffle metric key set to getDefaultNativeMetrics for shuffle write metrics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

keys -- Set("input_batch_count", "input_row_count", "input_batch_mem_size")
}

TreeMap[String, SQLMetric]() ++ enabledKeys.flatMap { key =>
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enabledKeys is a Set, so enabledKeys.flatMap { ... } produces a Set[(String, SQLMetric)]. That forces hashing/equality on SQLMetric instances and allocates an intermediate Set before building the TreeMap. To keep the optimization goal, consider iterating (enabledKeys.iterator.flatMap(...)) and accumulating directly into the TreeMap (or using enabledKeys.iterator.map(...) + collect), which avoids building a Set of metrics first.

Suggested change
TreeMap[String, SQLMetric]() ++ enabledKeys.flatMap { key =>
TreeMap[String, SQLMetric]() ++ enabledKeys.iterator.flatMap { key =>

Copilot uses AI. Check for mistakes.
"input_batch_count" -> (sc => SQLMetrics.createMetric(sc, "Native.input_batches")),
"input_row_count" -> (sc => SQLMetrics.createMetric(sc, "Native.input_rows")),
"input_batch_mem_size" -> (sc => SQLMetrics.createSizeMetric(sc, "Native.input_mem_bytes")))

Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing getDefaultNativeMetrics from a 1-arg method to a 2-arg method is a source/binary breaking change for any downstream code compiled against this module. If this object is part of a published API surface, consider keeping an overloaded getDefaultNativeMetrics(sc: SparkContext) (possibly deprecated) that delegates to the new implementation with the full default key set.

Suggested change
def getDefaultNativeMetrics(sc: SparkContext): Map[String, SQLMetric] = {
getDefaultNativeMetrics(sc, defaultNativeMetricCreators.keySet)
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize native metrics retrieval by passing keys directly

1 participant