Skip to content

fix: Added per rank log file for ODM#168

Merged
kmehant merged 4 commits intofoundation-model-stack:mainfrom
romitjain:bugfix/odm-single-file-write
Jan 30, 2026
Merged

fix: Added per rank log file for ODM#168
kmehant merged 4 commits intofoundation-model-stack:mainfrom
romitjain:bugfix/odm-single-file-write

Conversation

@romitjain
Copy link
Collaborator

This PR fixes a bug where different processes were writing to the same log file, causing intermittent OSError.
Now, every rank will write to a separate file instead of writing a single file. The number of logs written will be same as before, but separated by files.

Since ODM sampling with accelerate dataloader can get quite involved (with dispatch batches, split batches), it is better to keep per rank log files for debugging (instead of only allowing rank 0 to log)

Signed-off-by: romit <romit@ibm.com>
@romitjain romitjain requested a review from kmehant January 30, 2026 05:39
Signed-off-by: romit <romit@ibm.com>
Signed-off-by: romit <romit@ibm.com>
@romitjain romitjain changed the title bugfix: Added per rank log file for ODM fix: Added per rank log file for ODM Jan 30, 2026
@kmehant kmehant merged commit 8c990a1 into foundation-model-stack:main Jan 30, 2026
9 checks passed
@romitjain romitjain deleted the bugfix/odm-single-file-write branch January 30, 2026 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants