Configuration¶
Sample Factory experiments are configured via command line parameters. The following command will print the help message for the algorithm-environment combination containing the list of all parameters, their descriptions, and their default values:
(replace train_gym_env
with your own training script name and CartPole-v1
with a different environment name to
get information about parameters specific to this particular environment).
Default parameter values and their help strings are defined in sample_factory/cfg/cfg.py
.
Besides that, additional parameters can be defined in specific environment integrations, for example in
sf_examples/envpool/mujoco/envpool_mujoco_params.py
.
config.json¶
Once the new experiment is started, a directory containing experiment-related files is created in --train_dir
location (or ./train_dir
in cwd
if --train_dir
is not passed from command line). This directory contains a file
config.json
where all the experiment parameters are saved (including those instantiated from their default values).
In addition to that, selected parameter values are printed to the console and thus are saved to sf_log.txt
file in the experiment directory.
Running an experiment and then stopping it to check the parameter values is a good practice to make sure
that the experiment is configured as expected.
Key parameters¶
-
--env
(required) full name that uniquely identifies the environment as it is registered in the environment registry (seeregister_env()
function). -
--experiment
a name that uniquely identifies the experiment and the experiment folder. E.g.--experiment=my_experiment
. If the experiment folder with the name already exists the experiment (by default) will be resumed! Resuming experiments after a stop is the default behavior in Sample Factory. When the experiment is resumed from command line are taken into account, unspecified parameters will be loaded from the existing experimentconfig.json
file. If you want to start a new experiment, delete the old experiment folder or change the experiment name. You can also use--restart_behavior=[resume|restart|overwrite]
to control this behavior. -
--train_dir
location for all experiments folders, defaults to./train_dir
. -
--num_workers
defaults to number of logical cores in the system, which will give the best throughput in most scenarios. -
--num_envs_per_worker
will greatly affect the performance. Large values (15-30) improve hardware utilization but increase memory usage and policy lag. Must be even for the double-buffered sampling to work. Disable double-buffered sampling by setting--worker_num_splits=1
to use odd number of envs per worker (e.g. 1 env per worker). (Default: 2) A good rule of thumb is to set this to relatively low value (e.g. 4 or 8 for common envs) and then increase it until you see no more performance improvements or you start losing sample efficiency due to the policy lag. -
--rollout
is the length of trajectory collected by each agent. -
--batch_size
is the minibatch size for SGD. --num_batches_per_epoch
is the number of minibatches the training batch (dataset) is split into.--num_epochs
is the number of epochs on the learner over one training batch (dataset).
The above six parameters (batch_size, num_batches_per_epoch, rollout, num_epochs, num_workers, num_envs_per_worker
) have the
biggest influence on the data regime of the RL algorithm and thus on the sample efficiency and the training speed.
num_workers
, num_envs_per_worker
, and rollout
define how many samples are collected per iteration (one rollout for all envs), which is
sampling_size = num_workers * num_envs_per_worker * rollout
(note that this is further multiplied by env's num_agents
for multi-agent envs).
batch_size
and num_batches_per_epoch
define how many samples are used for training per iteration.
If sampling_size >> batch_size
then we will need many iterations of training to go through the data, which
will make some experience stale by the time it is used for training (policy lag). See Policy Lag
for additional information.
Evaluation script parameters¶
Evaluation scripts (i.e. sf_examples/atari/enjoy_atari.py
) use the same configuration parameters as training scripts
for simplicity, although of course many of them are ignored as they don't affect evaluation.
In addition to that, evaluation scripts provide additional parameters, see add_eval_args()
in sample_factory/cfg/cfg.py
.
HuggingFace Hub integration guide provides a good overview of the important parameters
such as --save_video
, check it out!
Full list of parameters¶
Please see the Full Parameter Reference auto-generated using the --help
flag for the full list of available command line arguments.