Hi, thanks for the interesting paper and for releasing such a nice codebase.
I’m trying to run the codebase on 8× NVIDIA A6000 (48GB) GPUs, but I’m consistently hitting CUDA out-of-memory (OOM) errors. Do you have recommendations on which hyperparameters are the most effective to tune to reduce GPU memory usage while preserving results as much as possible?
For example:
- number of frames
- number of rollouts
- completion length (generation length)
- per device batch size
- any other memory-critical settings you recommend adjusting first
If there are known “safe” ranges for smaller GPU budgets, that would be very helpful as well.
I also have follow-up questions:
- Are the Table 15 results from different training results with a varying number of frames? (e.g., 64 frame training -> 64 frame inference)
- Is
total batch size = num_gpus * per_device_train_batch_size * steps_per_generation?
Thank you!