-
Notifications
You must be signed in to change notification settings - Fork 10
feat(providers): add lifecycle hooks for offline execution setup #498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add optional lifecycle hooks to DiagnosticProvider that enable providers to prepare for offline execution on HPC compute nodes: - setup() - orchestrates all hooks in order - setup_environment() - environment setup (conda for CondaDiagnosticProvider) - fetch_data() - pre-fetch reference data to pooch cache - post_setup() - post-setup tasks - validate_setup() - verify setup is complete Provider implementations: - ESMValToolProvider: fetches "esmvaltool" registry - PMPDiagnosticProvider: fetches "pmp-climatology" registry - ILAMBProvider: fetches "ilamb-test", "ilamb", "iomb" registries CLI command `ref providers setup` added with options: - --provider: filter to single provider - --skip-env: skip environment setup - --skip-data: skip data fetching - --validate-only: only run validation All hooks are idempotent and safe to run multiple times.
Move requests.get() inside try/except block so network errors (e.g., ConnectionError in offline HPC environments) are caught and handled gracefully by creating an empty default file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds provider lifecycle hooks and a CLI entrypoint to pre-stage software environments and reference data for offline execution (e.g., on HPC compute nodes without internet).
Changes:
- Introduces
DiagnosticProviderlifecycle hooks (setup,setup_environment,fetch_data,post_setup,validate_setup) and implements them in several providers. - Adds
ref providers setupCLI command and updates the offline-solve CI workflow to use it. - Improves resilience to network failures when downloading the default ignore-datasets file and expands test coverage.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/climate-ref/tests/unit/test_config.py | Adds unit coverage for ignore-datasets download behavior on network failures. |
| packages/climate-ref/tests/unit/cli/test_providers.py | Adds CLI tests for the new providers setup command and flags. |
| packages/climate-ref/tests/integration/test_provider_setup.py | Adds integration tests asserting lifecycle hook idempotence/behavior. |
| packages/climate-ref/src/climate_ref/config.py | Fixes network-error handling by moving requests.get into the try block. |
| packages/climate-ref/src/climate_ref/cli/providers.py | Adds providers setup command to orchestrate environment/data setup and validation. |
| packages/climate-ref-pmp/src/climate_ref_pmp/init.py | Implements fetch_data() to prefetch the pmp-climatology registry. |
| packages/climate-ref-ilamb/src/climate_ref_ilamb/init.py | Introduces ILAMBProvider with fetch_data() to prefetch ILAMB registries. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/init.py | Introduces ESMValToolProvider with fetch_data() to prefetch ESMValTool registry. |
| packages/climate-ref-core/tests/unit/test_providers.py | Adds unit tests for lifecycle hook defaults and conda-provider setup/validation. |
| packages/climate-ref-core/src/climate_ref_core/testing.py | Adds NetworkBlockedError placeholder for offline-network testing support. |
| packages/climate-ref-core/src/climate_ref_core/providers.py | Adds lifecycle hook APIs to DiagnosticProvider and conda overrides for env setup/validation. |
| changelog/498.feature.md | Documents the new lifecycle hooks and ref providers setup command. |
| Makefile | Stops ignoring a (now absent) offline-solve integration test file. |
| .github/workflows/ci-offline-solve.yaml | Updates offline-solve workflow to run providers setup and adds caching and host directory mounting. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Replace `ref providers create-env` with `ref providers setup` as the recommended command throughout documentation - Add lifecycle hooks documentation for provider developers (setup_environment, fetch_data, post_setup, validate_setup) - Add offline execution warning to diagnostic implementation guide - Update HPC executor docs with concrete setup commands - Fix example provider README to use correct API (DiagnosticProvider) - Remove redundant provider-specific fetch commands from download guide (now handled by `ref providers setup`) - Update HPC prefetch script to use new workflow
Follow-up workAfter this PR merges, we should address the following in separate PRs:
Also ignore list. |
- Remove unused NetworkBlockedError exception (dead code) - Add skip_env/skip_data parameters to DiagnosticProvider.setup() - Use provider.setup() in CLI instead of manually calling hooks - Add validate_setup() call after setup to provide immediate feedback - Improve exception logging with stack traces via logger.opt(exception=True) - Add tests for new skip_env/skip_data parameters
| Run this on a login node with internet access before solving on compute nodes. | ||
|
|
||
| Examples | ||
| -------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't render nicely:
...
This command prepares all providers for offline execution by: 1. Creating conda environments (if applicable) 2. Fetching required reference datasets to pooch cache
All operations are idempotent and safe to run multiple times. Run this on a login node with internet access before solving on compute nodes.
Examples -------- # Setup all providers ref providers setup
# Setup only ESMValTool provider ref providers setup --provider esmvaltool
# Only validate, don't run setup ref providers setup --validate-only
...
|
|
||
|
|
||
| @app.command() | ||
| def setup( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about deprecating the create-env command, since it will no longer be needed and may cause confusion?
It be nice if ref providers list also shows the paths where data is installed.
|
|
||
| def validate_setup(self, config: Config) -> bool: | ||
| """Validate conda environment exists.""" | ||
| return self.env_path.exists() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To really check that the pooch data has downloaded successfully, you could run the download again, that will compute a checksum for each file and compare it to the reference value.
| for provider_ in providers: | ||
| if validate_only: | ||
| is_valid = provider_.validate_setup(config) | ||
| status = "[green]valid[/green]" if is_valid else "[red]invalid[/red]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if this message was more informative: let the user know what is missing/wrong and how do they solve it? E.g. "Conda environment for PMP is not available at /path/to/software/conda/pmp-a0cd98ec96f7c858f788439a4f0b4951fe2d7a55, please run ref providers setup --provider pmp to install it.`
| """ | ||
| pass | ||
|
|
||
| def post_setup(self, config: Config) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe leave this one out until we actually need it?
Co-authored-by: Bouwe Andela <b.andela@esciencecenter.nl>
- Deprecate create-env command with warning, recommend ref providers setup - Add data path column to ref providers list output - Add validate_registry_cache() for checksum verification of cached data - Providers now verify checksums in validate_setup() - Add informative error messages with paths and suggested commands - Remove unused post_setup() lifecycle hook - Simplify docstring formatting for better CLI rendering - Update CLAUDE.md to reference setup instead of create-env
Description
Add provider lifecycle hooks that enable providers to prepare for offline execution on HPC compute nodes where internet access is unavailable during solve operations.
New Lifecycle Hooks
Added to
DiagnosticProviderbase class:setup()- orchestrates all hooks in correct ordersetup_environment()- environment setup (e.g., conda)fetch_data()- pre-fetch reference data to pooch cachepost_setup()- post-setup tasksvalidate_setup()- verify setup is completeAll hooks are optional (default no-op) and idempotent (safe to run multiple times).
Provider Implementations
CLI Command
New
ref providers setupcommand with options:--provider: filter to single provider--skip-env: skip environment setup--skip-data: skip data fetching--validate-only: only validate setup statusUsage
Additional Changes
providers setupChecklist
Please confirm that this pull request has done the following:
changelog/