Refactor Dataset Pipeline: Modular LensDataset & Transforms by BeathovenGala · Pull Request #128 · ML4SCI/DeepLense

BeathovenGala · 2026-02-05T16:59:34Z

Description for issue #126

The current implementation in [dataset/preprocessing_model_2.py] tightly couples data loading, hardcoded category logic (e.g., if file_name.startswith('axion')), and transformations. This has resulted in a fragile, duplicated, and untestable codebase.

Issues with current approach:

Fragile: Breaks on new .npy datasets that don't match specific hardcoded filename prefixes.
Duplicated: Normalization logic (e.g., Min-Max scaling) is copied across 3 different files, multiplying the risk of bugs (e.g., division by zero).
Untestable: No isolation between loading and processing, making it impossible to verify transformations independently.

Fixes

This PR refactors the pipeline into modular, reusable components:

Introduced [LensDataset] (pure loading) and [WrapperDataset] (handles categories dynamically via config, removing hardcoded logic).
Added [get_transforms(config)] for a modular, configuration-driven augmentation pipeline.
Added comprehensive test suites for datasets and pipelines.

Problem Demo

The Error (Existing Code):

# Fragile: Fails if files aren't named 'axion'
if file_name.startswith('axion'): 
    data_point = data_point[0]

#  Duplicated: Normalization logic repeated in 3 locations
# Risk of division by zero not handled consistently
normalized = (data - min) / (max - min)

…ransforms

BeathovenGala added 2 commits February 4, 2026 23:41

refactor(utils/test.py): support multiple dataset paths and modular t…

88af314

…ransforms

Refactor: Modular LensDataset and transforms

8fab640

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Dataset Pipeline: Modular LensDataset & Transforms#128

Refactor Dataset Pipeline: Modular LensDataset & Transforms#128
BeathovenGala wants to merge 2 commits intoML4SCI:mainfrom
BeathovenGala:refactor/dataset-pipeline

BeathovenGala commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BeathovenGala commented Feb 5, 2026

Description for issue #126

Fixes

Problem Demo

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant