MuJoCo¶
Installation¶
Install Sample Factory with MuJoCo dependencies with PyPI:
Running Experiments¶
Run MuJoCo experiments with the scripts in sf_examples.mujoco
.
The default parameters have been chosen to match CleanRL's results in the report below (please note
that we can achieve even faster training on a multi-core machine with more optimal parameters).
To train a model in the Ant-v4
environment:
To visualize the training results, use the enjoy_mujoco
script:
If you're having issues with the Mujoco viewer in a Unix/Linux environment with Conda, try running the following
before executing the enjoy_mujoco
script:
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6
python -m sf_examples.mujoco.enjoy_mujoco ...
Multiple experiments can be run in parallel with the launcher module. mujoco_all_envs
is an example launcher script that runs all mujoco envs with 10 seeds.
python -m sample_factory.launcher.run --run=sf_examples.mujoco.experiments.mujoco_all_envs --backend=processes --max_parallel=4 --pause_between=1 --experiments_per_gpu=10000 --num_gpus=1 --experiment_suffix=0
List of Supported Environments¶
Specify the environment to run with the --env
command line parameter. The following MuJoCo v4 environments are supported out of the box, and more environments can be added as needed in sf_examples.mujoco.mujoco.mujoco_utils
MuJoCo Environment Name | Sample Factory Command Line Parameter |
---|---|
Ant-v4 | mujoco_ant |
HalfCheetah-v4 | mujoco_halfcheetah |
Hopper-v4 | mujoco_hopper |
Humanoid-v4 | mujoco_humanoid |
Walker2d-v4 | mujoco_walker |
InvertedDoublePendulum-v4 | mujoco_doublependulum |
InvertedPendulum-v4 | mujoco_pendulum |
Reacher-v4 | mujoco_reacher |
Swimmer-v4 | mujoco_swimmer |
Results¶
Reports¶
-
Sample Factory was benchmarked on MuJoCo against CleanRL. Sample-Factory was able to achieve similar sample efficiency as CleanRL using the same parameters.
-
Sample Factory can run experiments synchronously or asynchronously, with asynchronous execution usually having worse sample efficiency but runs faster. MuJoCo's environments were compared using the two modes in Sample-Factory
-
Sample Factory comparison with CleanRL in terms of wall time. Both experiments are run on a 16 core machine with 1 GPU. Sample-Factory was able to complete 10M samples 5 times as fast as CleanRL
Models¶
Various APPO models trained on MuJoCo environments are uploaded to the HuggingFace Hub. The models have all been trained for 10M steps. Videos of the agents after training can be found on the HuggingFace Hub.
The models below are the best models from the experiment against CleanRL above. The evaluation metrics here are obtained by running the model 10 times.
Environment | HuggingFace Hub Models | Evaluation Metrics |
---|---|---|
Ant-v4 | https://huggingface.co/andrewzhang505/sample-factory-2-mujoco-ant | 5876.09 ± 166.99 |
HalfCheetah-v4 | https://huggingface.co/andrewzhang505/sample-factory-2-mujoco-halfcheetah | 6262.56 ± 67.29 |
Humanoid-v4 | https://huggingface.co/andrewzhang505/sample-factory-2-mujoco-humanoid | 5439.48 ± 1314.24 |
Walker2d-v4 | https://huggingface.co/andrewzhang505/sample-factory-2-mujoco-walker | 5487.74 ± 48.96 |
Hopper-v4 | https://huggingface.co/andrewzhang505/sample-factory-2-mujoco-hopper | 2793.44 ± 642.58 |
InvertedDoublePendulum-v4 | https://huggingface.co/andrewzhang505/sample-factory-2-mujoco-doublependulum | 9350.13 ± 1.31 |
InvertedPendulum-v4 | https://huggingface.co/andrewzhang505/sample-factory-2-mujoco-pendulum | 1000.00 ± 0.00 |
Reacher-v4 | https://huggingface.co/andrewzhang505/sample-factory-2-mujoco-reacher | -4.53 ± 1.79 |
Swimmer-v4 | https://huggingface.co/andrewzhang505/sample-factory-2-mujoco-swimmer | 117.28 ± 2.91 |
Videos¶
Below are some video examples of agents in various MuJoCo envioronments. Videos for all environments can be found in the HuggingFace Hub pages linked above.
HalfCheetah-v4¶
Ant-v4¶
InvertedDoublePendulum-v4¶