Implementation of the DreamGym framework from the paper "Scaling Agent Learning via Experience Synthesis" (arXiv:2511.03773).
DreamGym is a unified framework that synthesizes diverse experiences to enable effective online reinforcement learning (RL) training for autonomous agents. It addresses the challenges of costly rollouts, limited task diversity, unreliable reward signals, and infrastructure complexity.
- Reasoning Experience Model: Generates state transitions through chain-of-thought reasoning instead of expensive real environment rollouts
- Experience Replay Buffer: Stores and manages both real-world and synthesized experiences with quality filtering
- Curriculum Task Generator: Adaptively generates tasks at appropriate difficulty levels based on agent performance
- PPO Training: Implements Proximal Policy Optimization for policy learning
- Multi-Environment Support: Designed for WebArena, ALFWorld, and Tau-Bench
DreamGym/
├── src/dreamgym/
│ ├── core/ # Core data structures and configuration
│ │ ├── data_structures.py
│ │ └── config.py
│ ├── models/ # Core components
│ │ ├── reasoning_model.py
│ │ ├── replay_buffer.py
│ │ └── curriculum_generator.py
│ ├── training/ # Training pipeline
│ │ ├── policy.py
│ │ ├── ppo.py
│ │ ├── trainer.py
│ │ ├── train.py
│ │ └── evaluate.py
│ ├── environments/ # Environment integrations
│ │ └── base_env.py
│ └── utils/ # Utility functions
├── configs/ # Configuration files
│ └── default.yaml
├── data/ # Data storage
│ ├── offline/ # Offline demonstration data
│ ├── experiences/ # Experience replay data
│ └── checkpoints/ # Model checkpoints
├── tests/ # Unit and integration tests
├── logs/ # Training logs
├── results/ # Experiment results
├── requirements.txt # Python dependencies
└── setup.py # Package setup
- Python 3.9 or higher
- CUDA-capable GPU (recommended for faster training)
- 32GB+ RAM (64GB+ recommended)
- Clone the repository:
git clone <repository-url>
cd DreamGym- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt
pip install -e .- Set up API keys (optional, for LLM access):
export OPENAI_API_KEY="your-api-key"
# or
export ANTHROPIC_API_KEY="your-api-key"Train the DreamGym agent with default configuration:
python -m dreamgym.training.trainWith custom configuration:
python -m dreamgym.training.train --config configs/custom.yamlWith command-line overrides:
python -m dreamgym.training.train \
--env webarena \
--num-iterations 100 \
--batch-size 64 \
--use-wandb \
--seed 42Evaluate a trained policy:
python -m dreamgym.training.evaluate \
--checkpoint data/checkpoints/policy_iter_0100.json \
--env webarena \
--num-episodes 20 \
--output results/eval_results.jsonThe system is configured via YAML files. Key configuration sections:
reasoning_model: Reasoning experience model settingspolicy_model: Agent policy model settingsbuffer: Experience replay buffer parameterscurriculum: Curriculum learning parametersrl: RL algorithm hyperparameterstraining: Training loop settingsenvironment: Environment configuration
See configs/default.yaml for a complete example.
Generates synthetic state transitions using LLM-based reasoning:
from dreamgym.models.reasoning_model import ReasoningExperienceModel
from dreamgym.core.config import ReasoningModelConfig
config = ReasoningModelConfig()
reasoning_model = ReasoningExperienceModel(config, llm_client)
experience = reasoning_model.generate_experience(
state=current_state,
action=agent_action,
task=task
)Stores and samples experiences for training:
from dreamgym.models.replay_buffer import ExperienceReplayBuffer
from dreamgym.core.config import BufferConfig
config = BufferConfig(capacity=100000)
buffer = ExperienceReplayBuffer(config)
# Add experiences
buffer.add_experience(experience)
# Sample for training
batch = buffer.sample_balanced(batch_size=64)Adaptively generates tasks based on agent performance:
from dreamgym.models.curriculum_generator import CurriculumTaskGenerator
from dreamgym.core.config import CurriculumConfig
config = CurriculumConfig()
generator = CurriculumTaskGenerator(config)
# Generate tasks
tasks = generator.generate_tasks(num_tasks=10, llm_client=llm_client)
# Update performance
generator.update_performance(completed_episode)The main training loop integrates all components:
- Task Generation: Curriculum generator creates tasks at appropriate difficulty
- Experience Collection: Agent performs rollouts (synthetic or real)
- Buffer Management: Experiences stored with quality filtering
- Policy Update: PPO updates policy using sampled experiences
- Evaluation: Periodic evaluation and checkpoint saving
To reproduce results from the paper:
- WebArena Experiments:
python -m dreamgym.training.train --config configs/webarena.yaml- ALFWorld Experiments:
python -m dreamgym.training.train --config configs/alfworld.yaml- Tau-Bench Experiments:
python -m dreamgym.training.train --config configs/taubench.yamlRun ablation studies by modifying configuration:
- Without Reasoning Model: Set
buffer.synthetic_ratio = 0.0 - Without Curriculum: Set
curriculum.difficulty_increment = 0.0 - Synthetic Only: Remove offline real data
pytest tests/With coverage:
pytest tests/ --cov=dreamgym --cov-report=htmlFormat code with Black:
black src/Lint with flake8:
flake8 src/Type checking with mypy:
mypy src/View training metrics:
tensorboard --logdir logs/Enable W&B logging:
python -m dreamgym.training.train --use-wandb- Import Errors: Ensure the package is installed with
pip install -e . - Out of Memory: Reduce batch size or buffer capacity
- Slow Training: Enable GPU acceleration, reduce max episode steps
- Poor Quality Experiences: Adjust quality threshold or validation settings
Enable verbose logging:
import logging
logging.basicConfig(level=logging.DEBUG)If you use this implementation, please cite the original paper:
@article{chen2025dreamgym,
title={Scaling Agent Learning via Experience Synthesis},
author={Chen, Zhaorun and others},
journal={arXiv preprint arXiv:2511.03773},
year={2025}
}This implementation is provided for research purposes.
This is a reproduction of the DreamGym framework described in arXiv:2511.03773. The original research was conducted by researchers from multiple institutions.
For questions about this implementation, please open an issue on the repository.