-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-55077][CORE][K8S]: Support spark.kubernetes.archives.avoidDownloadSchemes for K8s Cluster Mode #53845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…loadSchemes for K8s Cluster Mode Similar to SPARK-47475 for jars, this commit adds support for avoiding archive downloads in Kubernetes cluster mode when the archives are big and executor counts are high, to prevent network saturation and timeouts. Changes: - Add KUBERNETES_ARCHIVES_AVOID_DOWNLOAD_SCHEMES configuration - Implement avoidArchiveDownload function in SparkSubmit - Add test case to verify archives avoid download functionality The configuration accepts a comma-separated list of schemes (e.g., s3a, hdfs) or wildcard '*' to avoid downloading archives for any scheme.
JIRA Issue Information=== Improvement SPARK-55077 === This comment was automatically generated by GitHub Actions |
| "For use in cases when the archives are big and executor counts are high, " + | ||
| "concurrent download causes network saturation and timeouts. " + | ||
| "Wildcard '*' is denoted to not downloading archives for any the schemes.") | ||
| .version("4.0.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Apache Spark master branch is for 4.2.0-SNAPSHOT, new configuration should be 4.2.0, @xumanbu .
| } | ||
| } | ||
|
|
||
| test("Avoid archives download if scheme matches " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a test prefix style.
test("Avoid archives download if scheme matches " +
test("SPARK-55077: Avoid archives download if scheme matches " +There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get. I'll fix it.
|
@dongjoon-hyun I'have fixed all comment, please take a look. but build failed may case by this pr #53720, It's not caused by this PR. |
What changes were proposed in this pull request?
Similar to SPARK-47475 for jars, this commit adds support for avoiding archive downloads in Kubernetes cluster mode when the archives are big and executor counts are high, to prevent network saturation and timeouts.
Why are the changes needed?
Does this PR introduce any user-facing change?
Changes:
The configuration accepts a comma-separated list of schemes (e.g., s3a, hdfs) or wildcard '*' to avoid downloading archives for any scheme.
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
NO