Skip to content

Fix syntax errors and stabilize dataset preprocessing/tokenizer creation#19

Open
Ankitaghavate wants to merge 1 commit intoML4SCI:mainfrom
Ankitaghavate:fix-preprocessing-and-tokenizer-errors
Open

Fix syntax errors and stabilize dataset preprocessing/tokenizer creation#19
Ankitaghavate wants to merge 1 commit intoML4SCI:mainfrom
Ankitaghavate:fix-preprocessing-and-tokenizer-errors

Conversation

@Ankitaghavate
Copy link

Summary

This PR fixes syntax and runtime-breaking issues in the preprocessing and tokenizer creation pipeline without changing the original logic.

Changes

  • Fixed syntax errors in print() statements
  • Removed non-Python text causing runtime failure
  • Ensured dataset directories are created before saving files
  • Added encoding safety for CSV reading

Notes

  • No logic or data-processing behavior was modified
  • Changes are limited to bug fixes and stability improvements

Checklist

  • Code runs without syntax errors
  • No logic changes introduced
  • Existing structure preserved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant