Experiment Launcher¶

The simplest way to run experiments is just through command line, see Basic Usage for example.

For more complex workflows Sample Factory provides an interface that allows users to run experiments with multiple seeds or hyperparameter combinations with automatic distribution of work across GPUs on a single machine or multiple machines on the cluster.

The configuration of such experiments is done through in Python code, i.e. instead of yaml or json files we directly use Python scripts for ultimate flexibility.

Launcher scripts¶

Take a look at sf_examples/mujoco/experiments/mujoco_all_envs.py:

from sample_factory.launcher.run_description import Experiment, ParamGrid, RunDescription

_params = ParamGrid(
    [
        ("seed", [0, 1111, 2222, 3333, 4444, 5555, 6666, 7777, 8888, 9999]),
        ("env", ["mujoco_ant", "mujoco_halfcheetah", "mujoco_hopper", "mujoco_humanoid", "mujoco_doublependulum", "mujoco_pendulum", "mujoco_reacher", "mujoco_swimmer", "mujoco_walker"]),
    ]
)

_experiments = [
    Experiment(
        "mujoco_all_envs",
        "python -m sf_examples.mujoco.train_mujoco --algo=APPO --with_wandb=True --wandb_tags mujoco",
        _params.generate_params(randomize=False),
    ),
]

RUN_DESCRIPTION = RunDescription("mujoco_all_envs", experiments=_experiments)

This script defines a list of experiments to run. Here we have 10 seeds and 9 environments, so we will run 90 experiments in total with 90 different seed/env combinations. This can be extended in a straightforward way to run hyperparameter searches and so on.

The only requirement for such a script is that it defines a RUN_DESCRIPTION variable that references a RunDescription object. This object contains a list of Experiment objects, each of which potentially defines a gridsearch to run. Each experiment object defines a name, a "base" command line to run, and a ParamGrid that will generate parameter combinations to be added to the base command line. Take a look at other experiment scripts in sf_examples to see how to define more complex experiments.

Note that there's no requirement to use Launcher API to run experiments. You can just run individual experiments from the command line, use WandB hyperparam search features, use Ray Tune or any other tool you like. Launcher API is just a convenient feature for simple workflows available out of the box.

Complex hyperparameter configurations¶

The ParamGrid object above can define a cartesian product of parameter lists. In some cases we want searches over pairs (or tuples) of parameters at the same time.

For example:

_params = ParamGrid(
    [
        ("seed", [1111, 2222, 3333, 4444]),
        (("serial_mode", "async_rl"), ([True, False], [False, True])),
        (("use_rnn", "recurrence"), ([False, 1], [True, 16])),
    ]
)

Here we consider parameter pairs ("serial_mode", "async_rl") and ("use_rnn", "recurrence") at the same time. If we used a simple grid, we would have to execute useless combinations of parameters such as use_rnn=True, recurrence=1 or use_rnn=False, recurrence=16 (it makes sense to use recurrence > 1 only when using RNNs).

RunDescription arguments¶

Launcher script should expose a RunDescription object named RUN_DESCRIPTION that contains a list of experiments to run and some auxiliary parameters. RunDescription parameter reference:

class RunDescription:
    def __init__(
        self,
        run_name,
        experiments,
        experiment_arg_name="--experiment",
        experiment_dir_arg_name="--train_dir",
        customize_experiment_name=True,
        param_prefix="--",
    ):
        """
        :param run_name: overall name of the experiment and the name of the root folder
        :param experiments: a list of Experiment objects to run
        :param experiment_arg_name: CLI argument of the underlying experiment that determines it's unique name
               to be generated by the launcher. Default: --experiment
        :param experiment_dir_arg_name: CLI argument for the root train dir of your experiment. Default: --train_dir
        :param customize_experiment_name: whether to add a hyperparameter combination to the experiment name
        :param param_prefix: most experiments will use "--" prefix for each parameter, but some apps don't have this
               prefix, i.e. with Hydra you should set it to empty string.
        """

Using a launcher script¶

The script above can be executed using one of several backends. Additional backends are a welcome contribution! Please submit PRs :)

"Local" backend (multiprocessing)¶

Command line below will run all experiments on a single 4-GPU machine, scheduling 2 experiments per GPU, so running 8 experiments in parallel until all 90 are done. Note how we pass the full path to the launcher script using --run argument. The script should be in your Python path in a way that you should be able to import the module using the path you pass to --run (because this is what the Launcher internally does).

python -m sample_factory.launcher.run --run=sf_examples.mujoco.experiments.mujoco_all_envs --backend=processes --max_parallel=8  --pause_between=1 --experiments_per_gpu=2 --num_gpus=4

Slurm backend¶

The following command will run experiments on a Slurm cluster, creating a separate job for each experiment.

python -m sample_factory.launcher.run --run=sf_examples.mujoco.experiments.mujoco_all_envs --backend=slurm --slurm_workdir=./slurm_isaacgym --experiment_suffix=slurm --slurm_gpus_per_job=1 --slurm_cpus_per_gpu=16 --slurm_sbatch_template=./sample_factory/launcher/slurm/sbatch_timeout.sh --pause_between=1 --slurm_print_only=False

Here we will use 1 GPU and 16 CPUs per job (adjust according to your cluster configuration and experiment config). Note how we also pass --slurm_sbatch_template argument which contains a bash script that will bootstrap a job. In this particular example we use a template that will kill the job if it runs longer than a certain amount of time and then restarts itself (controlled by --slurm_timeout which defaults to 0, i.e. no timeout). Feel free to use your custom template if your job has certain pre-requisites (i.e. installing some packages or activating a Python environment).

Please find additional Slurm considerations in How to use Sample Factory on Slurm guide.

NGC backend¶

We additionally provide a backend for NGC clusters (https://ngc.nvidia.com/).

python -m sample_factory.launcher.run --run=sf_examples.mujoco.experiments.mujoco_all_envs --backend=ngc --ngc_job_template=run_scripts/ngc_job_16g_1gpu.template --ngc_print_only=False --train_dir=/workspace/train_dir

Here --ngc_job_template contains information about which Docker image to run plus any additional job bootstrapping. The command will essentially spin a separate VM on the cloud for each job. Point --train_dir to a mounted workspace folder so that you can access results of your experiments (trained models, logs, etc.)

Additional CLI examples¶

Local multiprocessing backend:
$ python -m sample_factory.launcher.run --run=sf_examples.vizdoom.experiments.paper_doom_battle2_appo --backend=processes --max_parallel=4 --pause_between=10 --experiments_per_gpu=1 --num_gpus=4

Parallelize with Slurm:
$ python -m sample_factory.launcher.run --run=megaverse_rl.runs.single_agent --backend=slurm --slurm_workdir=./megaverse_single_agent --experiment_suffix=slurm --pause_between=1 --slurm_gpus_per_job=1 --slurm_cpus_per_gpu=12 --slurm_sbatch_template=./megaverse_rl/slurm/sbatch_template.sh --slurm_print_only=False

Parallelize with NGC (https://ngc.nvidia.com/):
$ python -m sample_factory.launcher.run --run=rlgpu.run_scripts.dexterous_manipulation --backend=ngc --ngc_job_template=run_scripts/ngc_job_16g_1gpu.template --ngc_print_only=False --train_dir=/workspace/train_dir

Command-line reference¶

usage: run.py [-h] [--train_dir TRAIN_DIR] [--run RUN]
              [--backend {processes,slurm,ngc}]
              [--pause_between PAUSE_BETWEEN]
              [--experiment_suffix EXPERIMENT_SUFFIX]

# Multiprocessing backend:
              [--num_gpus NUM_GPUS]
              [--experiments_per_gpu EXPERIMENTS_PER_GPU]
              [--max_parallel MAX_PARALLEL]

# Slurm-related:
              [--slurm_gpus_per_job SLURM_GPUS_PER_JOB]
              [--slurm_cpus_per_gpu SLURM_CPUS_PER_GPU]
              [--slurm_print_only SLURM_PRINT_ONLY]
              [--slurm_workdir SLURM_WORKDIR]
              [--slurm_partition SLURM_PARTITION]
              [--slurm_sbatch_template SLURM_SBATCH_TEMPLATE]

# NGC-related
              [--ngc_job_template NGC_JOB_TEMPLATE]
              [--ngc_print_only NGC_PRINT_ONLY]

Arguments:
  -h, --help            show this help message and exit
  --train_dir TRAIN_DIR
                        Directory for sub-experiments
  --run RUN             Name of the python module that describes the run, e.g.
                        sf_examples.vizdoom.experiments.doom_basic
  --backend {processes,slurm,ngc}
  --pause_between PAUSE_BETWEEN
                        Pause in seconds between processes
  --experiment_suffix EXPERIMENT_SUFFIX
                        Append this to the name of the experiment dir

Multiprocessing backend:
  --num_gpus NUM_GPUS   How many GPUs to use (only for local multiprocessing)
  --experiments_per_gpu EXPERIMENTS_PER_GPU
                        How many experiments can we squeeze on a single GPU
                        (-1 for not altering CUDA_VISIBLE_DEVICES at all)
  --max_parallel MAX_PARALLEL
                        Maximum simultaneous experiments (only for local multiprocessing)

Slurm-related:
  --slurm_gpus_per_job SLURM_GPUS_PER_JOB
                        GPUs in a single SLURM process
  --slurm_cpus_per_gpu SLURM_CPUS_PER_GPU
                        Max allowed number of CPU cores per allocated GPU
  --slurm_print_only SLURM_PRINT_ONLY
                        Just print commands to the console without executing
  --slurm_workdir SLURM_WORKDIR
                        Optional workdir. Used by slurm launcher to store
                        logfiles etc.
  --slurm_partition SLURM_PARTITION
                        Adds slurm partition, i.e. for "gpu" it will add "-p
                        gpu" to sbatch command line
  --slurm_sbatch_template SLURM_SBATCH_TEMPLATE
                        Commands to run before the actual experiment (i.e.
                        activate conda env, etc.) Example: https://github.com/alex-petrenko/megaverse/blob/master/megaverse_rl/slurm/sbatch_template.sh
                        (typically a shell script)
  --slurm_timeout SLURM_TIMEOUT
                        Time to run jobs before timing out job and requeuing the job. Defaults to 0, which does not time out the job

NGC-related:
  --ngc_job_template NGC_JOB_TEMPLATE
                        NGC command line template, specifying instance type, docker container, etc.
  --ngc_print_only NGC_PRINT_ONLY
                        Just print commands to the console without executing