AETHER-xAI

Description

This project develops an EO embedding/language model that can be used for explainable predictions from EO data.

Getting Started

Virtual environment

To install the dependencies in a venv using uv, first, clone the repo:

# clone project
git clone https://github.com/WUR-AI/aether
cd aether

Then, create a virtual environment (or alternatively via conda):

# Create venv
python3 -m venv .venv
source .venv/bin/activate

Then, install uv and use this to install all packages.

# install uv manager
pip install uv

# install all Python dependencies
uv sync # reads pyproject.toml + uv.lock

# install project locally (editable)
uv pip install -e .

Note, running uv sync in the venv will always update the package to the most up-to-date version (as defined by the repo's pyproject.toml file).

Set paths

Next, create a file in your local repo parent folder aether/ called .env and copy the contents of aether/.env.example:

cp .env.example .env

Adjust the paths in .env to your local system. At a minimum, you should set PROJECT_ROOT!.

Important: DATA_DIR should either point to aether/data/ (default setting) OR if it points to another folder (e.g., my/local/data/) then copy the contents of the aether/data/ folder to my/local/data/ to ensure the butterfly use case runs using the provided example data. Other data will automatically be downloaded and organised by pooch if possible into DATA_DIR, or should be copied manually.

Data folders should follow the following directory structure within DATA_DIR:

├── registry.txt                         <- Pooch config file, don't change.
├── s2bms/                               <- Dataset folder.
│   ├── model_ready_s2bms.csv            <- Csv file with "name_loc" id, locations, aux data and target data.
│   ├── aux_classes.csv                  <- Csv file with explanations for aux data class names.
│   ├── caption_templates                <- Caption templates
│       ├── v1.json                      <- Json file with list of caption templates (referencing aux column names).
│   ├── splits/                          <- Torch data splits
│   ├── source/                          <- Optional: source data used to create model_ready csv.
│   ├── eo/                              <- EO data modalities
│       ├── s2/                          <- Modality 1: (e.g. sentinel-2)
│           ├── s2_<NAME_LOC_1>.tif      <- EO modality data for a single location (indexed by unique <NAME_LOC>)
│           ├── s2_<NAME_LOC_2>.tif
│       ├── aef/                         <- Modality 2: (e.g. AEF)
│       ├── other_modality/
├── other_dataset/

Verify installation:

To verify whether the installation was successful, run the tests in aether/ using:

pytest --use-mock -m "not slow"

which should pass all tests.

Training

Currently, we have implemented 2 models: a prediction model (that predicts target variables from EO data) and an alignment model (that aligns EO embeddings with text embeddings).

Experiment configurations (such as choosing data, encoders, hyperparameters etc.) are managed through Hydra configurations. Define your experiment configurations in configs/experiments/experiment_name.yaml, for example to train predictive model with GeoCLIP coordinate encoder for the Butterfly data using configs/experiments/prediction.yaml (copied below)

# @package _global_
# all parameters below will be merged with parameters from default configurations set above
# this allows you to overwrite only specified parameters

defaults:
  - override /model: predictive_geoclip
  - override /data: butterfly_coords


tags: ["prediction", "geoclip_coords"]

seed: 12345

trainer:
  min_epochs: 1
  max_epochs: 100

data:
  batch_size: 64

logger:
  wandb:
    tags: ${tags}
    group: "predictive"
  aim:
    experiment: "predictive"

To execute this experiment run (inside your venv):

python train.py experiment=prediction

Please see the Hydra and Hydra-Lightning template documentation for further examples of how to configure training runs.

Directory structure

We follow the directory structure from the Hydra-Lightning template, which looks like:

├── .github                   <- Github Actions workflows
│
├── configs                   <- Hydra configs
│   ├── callbacks                <- Callbacks configs
│   ├── data                     <- Data configs
│   ├── debug                    <- Debugging configs
│   ├── experiment               <- Experiment configs
│   ├── extras                   <- Extra utilities configs
│   ├── hparams_search           <- Hyperparameter search configs
│   ├── hydra                    <- Hydra configs
│   ├── local                    <- Local configs
│   ├── logger                   <- Logger configs
│   ├── model                    <- Model configs
│   ├── paths                    <- Project paths configs
│   ├── trainer                  <- Trainer configs
│   │
│   ├── eval.yaml             <- Main config for evaluation
│   └── train.yaml            <- Main config for training
│
├── data                   <- Project data (for aether, this can also be elsewhere, see environment paths).
│
├── logs                   <- Logs generated by hydra and lightning loggers
│
├── notebooks              <- Jupyter notebooks. Naming convention is a number (for ordering),
│                             the creator's initials, and a short `-` delimited description,
│                             e.g. `1.0-jqp-initial-data-exploration.ipynb`.
│
├── scripts                <- Shell scripts
│
├── src                    <- Source code
│   ├── data                     <- Data scripts
│   ├── data_prepocessing        <- Data preprocessing scripts
│   ├── models                   <- Model scripts
│   ├── utils                    <- Utility scripts
│   │
│   ├── eval.py                  <- Run evaluation
│   └── train.py                 <- Run training
│
├── tests                  <- Tests of any kind
│
├── .env.example              <- Example of file for storing private environment variables
├── .gitignore                <- List of files ignored by git
├── .pre-commit-config.yaml   <- Configuration of pre-commit hooks for code formatting
├── .project-root             <- File for inferring the position of project root directory
├── environment.yaml          <- File for installing conda environment
├── Makefile                  <- Makefile with commands like `make train` or `make test`
├── pyproject.toml            <- Environment requirements, configuration options for testing and linting,
├── setup.py                  <- File for installing project as a package
├── uv.lock                   <- A frozen snapshot of exact dependencies for the uv package manager.
└── README.md

Attribution

This repo is based on the Hydra-Lightning template. Some code was adapted from github.com/vdplasthijs/PECL/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AETHER-xAI

Description

Getting Started

Virtual environment

Set paths

Verify installation:

Training

Directory structure

Attribution

About

Uh oh!

Releases 3

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 289 Commits
.github		.github
configs		configs
data		data
logs		logs
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.project-root		.project-root
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

License

WUR-AI/aether

Folders and files

Latest commit

History

Repository files navigation

AETHER-xAI

Description

Getting Started

Virtual environment

Set paths

Verify installation:

Training

Directory structure

Attribution

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 3

Uh oh!

Languages

Packages