[SPARK-55107] Log TID for scanned file in FileScanRDD #53876

turboFei · 2026-01-20T23:13:52Z

What changes were proposed in this pull request?

Log TID for scanned file in FileScanRDD.

Why are the changes needed?

Similar with #46966.

User story:
When we encounter Parquet file corruption in production, we typically have the Task ID (TID) from the error message. However, executor logs contain interleaved log messages from multiple tasks running concurrently. Without TID in the file scan logs, we cannot easily determine which specific files were scanned by the failing task, making it difficult to narrow down which Parquet file is corrupted.

By logging the TID alongside the scanned file path, we can now filter executor logs by the failing task's TID and quickly identify the exact set of files that were processed by that task.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Review, as it only touches log contents.

Was this patch authored or co-authored using generative AI tooling?

No.

github-actions · 2026-01-20T23:14:01Z

JIRA Issue Information

=== Improvement SPARK-55107 ===
Summary: Log TID for scanned file in FileScanRDD
Assignee: Cheng Pan
Status: Open
Affected: ["4.1.1"]

This comment was automatically generated by GitHub Actions

github-actions bot added the SQL label Jan 20, 2026

turboFei closed this Jan 20, 2026

turboFei reopened this Jan 20, 2026

[SPARK-55107] Log TID for scanned file in FileScanRDD

3d9822c

turboFei force-pushed the filescan_tid branch from 592bb61 to 3d9822c Compare January 20, 2026 23:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-55107] Log TID for scanned file in FileScanRDD #53876

[SPARK-55107] Log TID for scanned file in FileScanRDD #53876

turboFei commented Jan 20, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[SPARK-55107] Log TID for scanned file in FileScanRDD #53876

Are you sure you want to change the base?

[SPARK-55107] Log TID for scanned file in FileScanRDD #53876

Conversation

turboFei commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

github-actions bot commented Jan 20, 2026

JIRA Issue Information

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

turboFei commented Jan 20, 2026 •

edited

Loading