Install the core platoon package:
# Using uv (recommended)
uv sync
# Using pip
pip install -e .Platoon supports two training backends: Tinker and AReaL. Install the one you need:
# Using uv
uv sync --extra tinker
# Using pip
pip install -e ".[tinker]"Note: AReaL requires
uvfor installation as it's not available on PyPI.
# Using uv only (required)
uv sync --extra arealWandB should be installed alongside your chosen training backend:
# With Tinker backend
# Using uv
uv sync --extra tinker --extra wandb
# Using pip
pip install -e ".[tinker,wandb]"
# With AReaL backend (uv only)
uv sync --extra areal --extra wandbInstall a plugin or extension:
cd plugins/<plugin-name>
uv sync # or: pip install -e .Platoon supports two training backends: Tinker and AReaL.
Tinker uses a service-based architecture. Make sure your Tinker service is running before training.
cd plugins/textcraft # or number-search, codegrep
# Using uv
uv run python -m platoon.textcraft.train_tinker --config textcraft_tinker.yaml
# Using python directly (after pip install)
python -m platoon.textcraft.train_tinker --config textcraft_tinker.yamlOverride config values from the command line:
uv run python -m platoon.textcraft.train_tinker \
--config textcraft_tinker.yaml \
stats.experiment_name=my-experiment \
stats.trial_name=trial1 \
train.batch_size=64 \
train.optimizer.learning_rate=1e-5Enable WandB logging by setting the mode in your config or via CLI:
# Via CLI override
uv run python -m platoon.textcraft.train_tinker \
--config textcraft_tinker.yaml \
stats.wandb.mode=online \
stats.wandb.project=my-projectOr in your YAML config:
stats:
experiment_name: my-experiment
trial_name: trial1
wandb:
mode: online # Options: online, offline, disabled
project: my-project
entity: my-team # optional
tags:
- experiment-tagAReaL uses a distributed training architecture. Refer to AReaL documentation for detailed setup instructions.
cd plugins/textcraft # or number-search, codegrep
uv run python3 -m areal.launcher.local \
platoon/textcraft/train.py \
--config platoon/textcraft/textcraft_areal.yaml \
experiment_name=textcraft-reinforce \
trial_name=trial0See AReaL documentation for distributed training setup.
# Training configuration
train:
model_name: Qwen/Qwen3-4B # HuggingFace model identifier
renderer_name: qwen3 # Prompt renderer type
batch_size: 32
num_epochs: 10
lora_rank: 32
optimizer:
learning_rate: 1e-6
workflow_config:
group_size: 8 # Rollouts per task for GRPO
rollout_config:
max_steps: 50
timeout: 900
# Eval configuration
eval:
strategy: epoch # When to evaluate: epoch, step, none
every: 1 # Frequency of evaluation
# Checkpoint configuration
checkpoint:
strategy: epoch
every: 5
load_checkpoint_path: null # Resume from checkpoint
# Paths
log_path: ./logs
tinker_base_url: null # Tinker service URL (uses default if null)
# Stats and logging
stats:
experiment_name: my-experiment
trial_name: trial1
wandb:
mode: online
project: my-projectSee AReaL documentation for config options.
See the dedicated guide: Trajectory visualization CLI.
# Install dev dependencies (include your existing extras to preserve them)
uv sync --extra tinker --group dev # Tinker backend
uv sync --extra areal --group dev # AReaL backend
uv sync --extra tinker --extra wandb --group dev # Tinker + WandB
# Install pre-commit hooks
uvx pre-commit installuv run pytest tests/ -vThe project uses ruff for linting/formatting and ty for type checking. Both run automatically via pre-commit hooks.
# Run all pre-commit checks manually
uvx pre-commit run --all-files
# Run individual tools
uv run ruff check . # Lint
uv run ruff format . # Format
uvx ty check # Type checkPre-commit hooks run automatically on git commit. They include:
- ruff: Linting with auto-fix
- ruff-format: Code formatting
- ty: Type checking
- conventional-pre-commit: Validates commit message format
If a hook fails, fix the issues and commit again.
This project uses Conventional Commits. Commit messages must follow the format:
type(scope): description
Common types: feat, fix, docs, style, refactor, test, chore
Examples:
feat: add user authenticationfix(api): handle null responsedocs: update README
Pull requests and pushes to main trigger CI checks (see .github/workflows/ci.yml):
- pr-title: Validates PR title follows conventional commit format
- lint: Runs pre-commit hooks (ruff + ty)
- test: Runs pytest