Skip to content

Conversation

@lewisjared
Copy link
Contributor

@lewisjared lewisjared commented Jan 28, 2026

Description

Add provider lifecycle hooks that enable providers to prepare for offline execution on HPC compute nodes where internet access is unavailable during solve operations.

New Lifecycle Hooks

Added to DiagnosticProvider base class:

  • setup() - orchestrates all hooks in correct order
  • setup_environment() - environment setup (e.g., conda)
  • fetch_data() - pre-fetch reference data to pooch cache
  • post_setup() - post-setup tasks
  • validate_setup() - verify setup is complete

All hooks are optional (default no-op) and idempotent (safe to run multiple times).

Provider Implementations

  • ESMValToolProvider: fetches "esmvaltool" registry data
  • PMPDiagnosticProvider: fetches "pmp-climatology" registry data
  • ILAMBProvider: fetches "ilamb-test", "ilamb", "iomb" registry data

CLI Command

New ref providers setup command with options:

  • --provider: filter to single provider
  • --skip-env: skip environment setup
  • --skip-data: skip data fetching
  • --validate-only: only validate setup status

Usage

# On login node with internet access:
ref providers setup

# Then on compute node without internet:
ref solve

Additional Changes

  • Updated CI offline-solve workflow to use providers setup
  • Fixed bug where network errors during ignore-datasets file fetch weren't caught

Checklist

Please confirm that this pull request has done the following:

  • Tests added
  • Documentation added (where applicable)
  • Changelog item added to changelog/

Add optional lifecycle hooks to DiagnosticProvider that enable providers
to prepare for offline execution on HPC compute nodes:

- setup() - orchestrates all hooks in order
- setup_environment() - environment setup (conda for CondaDiagnosticProvider)
- fetch_data() - pre-fetch reference data to pooch cache
- post_setup() - post-setup tasks
- validate_setup() - verify setup is complete

Provider implementations:
- ESMValToolProvider: fetches "esmvaltool" registry
- PMPDiagnosticProvider: fetches "pmp-climatology" registry
- ILAMBProvider: fetches "ilamb-test", "ilamb", "iomb" registries

CLI command `ref providers setup` added with options:
- --provider: filter to single provider
- --skip-env: skip environment setup
- --skip-data: skip data fetching
- --validate-only: only run validation

All hooks are idempotent and safe to run multiple times.
Move requests.get() inside try/except block so network errors (e.g.,
ConnectionError in offline HPC environments) are caught and handled
gracefully by creating an empty default file.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds provider lifecycle hooks and a CLI entrypoint to pre-stage software environments and reference data for offline execution (e.g., on HPC compute nodes without internet).

Changes:

  • Introduces DiagnosticProvider lifecycle hooks (setup, setup_environment, fetch_data, post_setup, validate_setup) and implements them in several providers.
  • Adds ref providers setup CLI command and updates the offline-solve CI workflow to use it.
  • Improves resilience to network failures when downloading the default ignore-datasets file and expands test coverage.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/climate-ref/tests/unit/test_config.py Adds unit coverage for ignore-datasets download behavior on network failures.
packages/climate-ref/tests/unit/cli/test_providers.py Adds CLI tests for the new providers setup command and flags.
packages/climate-ref/tests/integration/test_provider_setup.py Adds integration tests asserting lifecycle hook idempotence/behavior.
packages/climate-ref/src/climate_ref/config.py Fixes network-error handling by moving requests.get into the try block.
packages/climate-ref/src/climate_ref/cli/providers.py Adds providers setup command to orchestrate environment/data setup and validation.
packages/climate-ref-pmp/src/climate_ref_pmp/init.py Implements fetch_data() to prefetch the pmp-climatology registry.
packages/climate-ref-ilamb/src/climate_ref_ilamb/init.py Introduces ILAMBProvider with fetch_data() to prefetch ILAMB registries.
packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/init.py Introduces ESMValToolProvider with fetch_data() to prefetch ESMValTool registry.
packages/climate-ref-core/tests/unit/test_providers.py Adds unit tests for lifecycle hook defaults and conda-provider setup/validation.
packages/climate-ref-core/src/climate_ref_core/testing.py Adds NetworkBlockedError placeholder for offline-network testing support.
packages/climate-ref-core/src/climate_ref_core/providers.py Adds lifecycle hook APIs to DiagnosticProvider and conda overrides for env setup/validation.
changelog/498.feature.md Documents the new lifecycle hooks and ref providers setup command.
Makefile Stops ignoring a (now absent) offline-solve integration test file.
.github/workflows/ci-offline-solve.yaml Updates offline-solve workflow to run providers setup and adds caching and host directory mounting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link

codecov bot commented Jan 28, 2026

- Replace `ref providers create-env` with `ref providers setup` as the
  recommended command throughout documentation
- Add lifecycle hooks documentation for provider developers (setup_environment,
  fetch_data, post_setup, validate_setup)
- Add offline execution warning to diagnostic implementation guide
- Update HPC executor docs with concrete setup commands
- Fix example provider README to use correct API (DiagnosticProvider)
- Remove redundant provider-specific fetch commands from download guide
  (now handled by `ref providers setup`)
- Update HPC prefetch script to use new workflow
@lewisjared
Copy link
Contributor Author

lewisjared commented Jan 28, 2026

Follow-up work

After this PR merges, we should address the following in separate PRs:

  1. Move PMP climatology fetching and ingestion into PMP provider setup

    • Currently users must run ref datasets fetch-data --registry pmp-climatology and ref datasets ingest --source-type pmp-climatology separately
    • The PMP provider's fetch_data() hook already downloads to pooch cache, but the data also needs to be ingested into the catalog for the solver
    • Ideally ref providers setup would handle both fetching AND ingesting for pmp-climatology
  2. Fetch cartopy data for ILAMB provider

    • ILAMB uses cartopy for map plotting which requires shapefiles
    • The cartopy data download should be handled by the ILAMB provider's fetch_data() hook or post_setup() hook
    • Currently this is done via a separate script (scripts/download-cartopy-data.py)
  3. Fetch ESMValTool data

    • ESMValTool recipes should be fetched during provider setup. This ensures all recipe files are available for offline execution
    • ESMValTool uses cartopy for map plotting which requires shapefiles
    • ESMValTool fire diagnostic downloads pre-processed observational data from Zenodo, this also needs to be downloaded beforehand.

Also ignore list.

- Remove unused NetworkBlockedError exception (dead code)
- Add skip_env/skip_data parameters to DiagnosticProvider.setup()
- Use provider.setup() in CLI instead of manually calling hooks
- Add validate_setup() call after setup to provide immediate feedback
- Improve exception logging with stack traces via logger.opt(exception=True)
- Add tests for new skip_env/skip_data parameters
Run this on a login node with internet access before solving on compute nodes.

Examples
--------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't render nicely:

...
 This command prepares all providers for offline execution by: 1. Creating conda environments (if applicable) 2. Fetching required reference datasets to pooch cache                                                                                          
 All operations are idempotent and safe to run multiple times. Run this on a login node with internet access before solving on compute nodes.                                                                                                                 
 Examples --------     # Setup all providers     ref providers setup                                                                                                                                                                                          
 # Setup only ESMValTool provider     ref providers setup --provider esmvaltool                                                                                                                                                                               
 # Only validate, don't run setup     ref providers setup --validate-only                                                                                                                                                                                     
...



@app.command()
def setup(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about deprecating the create-env command, since it will no longer be needed and may cause confusion?

It be nice if ref providers list also shows the paths where data is installed.


def validate_setup(self, config: Config) -> bool:
"""Validate conda environment exists."""
return self.env_path.exists()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To really check that the pooch data has downloaded successfully, you could run the download again, that will compute a checksum for each file and compare it to the reference value.

for provider_ in providers:
if validate_only:
is_valid = provider_.validate_setup(config)
status = "[green]valid[/green]" if is_valid else "[red]invalid[/red]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if this message was more informative: let the user know what is missing/wrong and how do they solve it? E.g. "Conda environment for PMP is not available at /path/to/software/conda/pmp-a0cd98ec96f7c858f788439a4f0b4951fe2d7a55, please run ref providers setup --provider pmp to install it.`

"""
pass

def post_setup(self, config: Config) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe leave this one out until we actually need it?

lewisjared and others added 2 commits January 29, 2026 10:02
Co-authored-by: Bouwe Andela <b.andela@esciencecenter.nl>
- Deprecate create-env command with warning, recommend ref providers setup
- Add data path column to ref providers list output
- Add validate_registry_cache() for checksum verification of cached data
- Providers now verify checksums in validate_setup()
- Add informative error messages with paths and suggested commands
- Remove unused post_setup() lifecycle hook
- Simplify docstring formatting for better CLI rendering
- Update CLAUDE.md to reference setup instead of create-env
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants