Implementation of Imitation Bootstrapped Reinforcement Learning (IBRL) and baeslines (RLPD, RFT) on Robomimic and Meta-World Tasks.
[Sep 2024] A fix to set_env.sh is pushed to address the slow parallel evaluation problem in robomimic.
We need --recursive to get the correct submodule
git clone --recursive https://github.com/hengyuan-hu/ibrl.gitFirst Install MuJoCo
Download the MuJoCo version 2.1 binaries for Linux
Extract the downloaded mujoco210 directory into ~/.mujoco/mujoco210.
First create a conda env with name ibrl.
conda create --name ibrl python=3.9Then, source set_env.sh to activate ibrl conda env. It also setup several important paths such as MUJOCO_PY_MUJOCO_PATH and add current project folder to PYTHONPATH.
Note that if the conda env has a different name, you will need to manually modify the set_env.sh.
You also need to modify the set_env.sh if the mujoco is not installed at the default location.
# NOTE: run this once per shell before running any script from this repo
source set_env.shThen install python dependencies
# first install pytorch with correct cuda version, in our case we use torch 2.1 with cu121
pip install torch==2.1.0 torchvision==0.16.0 --index-url https://download.pytorch.org/whl/cu121
# then install extra dependencies from requirement.txt
pip install -r requirements.txtIf the command above does not work for your versions.
Please check out tools/core_packages.txt for a list of commands to manually install relavent packages.
We have a C++ module in the common utils that requires compilation
cd common_utils
makeLater when running the training commands, if we encounter the following error
ImportError: .../libstdc++.so.6: version `GLIBCXX_3.4.30' not foundThen we can force the conda to use the system c++ lib.
Use these command to symlink the system c++ lib into conda env. To find PATH_TO_CONDA_ENV, run echo ${CONDA_PREFIX:-"$(dirname $(which conda))/../"}.
ln -sf /lib/x86_64-linux-gnu/libstdc++.so.6 PATH_TO_CONDA_ENV/bin/../lib/libstdc++.so
ln -sf /lib/x86_64-linux-gnu/libstdc++.so.6 PATH_TO_CONDA_ENV/bin/../lib/libstdc++.so.6Remember to run source set_env.sh once per shell before running any script from this repo.
Download dataset and models from Google Drive and put the folders under release folder.
The release folder should contain release/cfgs (already shipped with the repo), release/data and release/model (the latter two are from the downloaded zip file).
Train RL policy using the BC policy provided in release folder
# can
python train_rl.py --config_path release/cfgs/robomimic_rl/can_ibrl.yaml
# square
python train_rl.py --config_path release/cfgs/robomimic_rl/square_ibrl.yamlUse --save_dir PATH to specify where to store the logs and models.
Use --use_wb 0 to disable logging to weight and bias.
Use the following commands to train a BC policy from scratch. We find that IBRL is not sensitive to the exact performance of the BC policy.
# can
python train_bc.py --config_path release/cfgs/robomimic_bc/can.yaml
# square
python train_bc.py --config_path release/cfgs/robomimic_bc/square.yaml# can
python train_rl.py --config_path release/cfgs/robomimic_rl/can_rlpd.yaml
# square
python train_rl.py --config_path release/cfgs/robomimic_rl/square_rlpd.yamlThese commands run RFT from pretrained models in release folder.
# can rft
python train_rl.py --config_path release/cfgs/robomimic_rl/can_rft.yaml
# square rft
python train_rl.py --config_path release/cfgs/robomimic_rl/square_rft.yamlTo only perform pretraining:
# can, pretraining for 5 x 10,000 steps
python train_rl.py --config_path release/cfgs/robomimic_rl/can_rft.yaml --pretrain_only 1 --pretrain_num_epoch 5 --load_pretrained_agent None
# square, pretraining for 10 x 10,000 steps
python train_rl.py --config_path release/cfgs/robomimic_rl/square_rft.yaml --pretrain_only 1 --pretrain_num_epoch 10 --load_pretrained_agent NoneTrain IBRL using the provided state BC policies:
# can state
python train_rl.py --config_path release/cfgs/robomimic_rl/can_state_ibrl.yaml
# square state
python train_rl.py --config_path release/cfgs/robomimic_rl/square_state_ibrl.yamlTo train a state BC policy from scratch:
# can
python train_bc.py --config_path release/cfgs/robomimic_bc/can_state.yaml
# square
python train_bc.py --config_path release/cfgs/robomimic_bc/square_state.yaml# can state
python train_rl.py --config_path release/cfgs/robomimic_rl/can_state_rlpd.yaml
# square state
python train_rl.py --config_path release/cfgs/robomimic_rl/square_state_rlpd.yamlSince state policies are fast to train, we can just run pretrain and RL fine-tuning in one step.
# can
python train_rl.py --config_path release/cfgs/robomimic_rl/can_state_rft.yaml
# square
python train_rl.py --config_path release/cfgs/robomimic_rl/square_state_rft.yamlTrain RL policy using the BC policy provided in release folder
# assembly
python mw_main/train_rl_mw.py --config_path release/cfgs/metaworld/ibrl_basic.yaml --bc_policy assembly
# boxclose
python mw_main/train_rl_mw.py --config_path release/cfgs/metaworld/ibrl_basic.yaml --bc_policy boxclose
# coffeepush
python mw_main/train_rl_mw.py --config_path release/cfgs/metaworld/ibrl_basic.yaml --bc_policy coffeepush
# stickpull
python mw_main/train_rl_mw.py --config_path release/cfgs/metaworld/ibrl_basic.yaml --bc_policy stickpullIf you want to train BC policy from scratch
python mw_main/train_bc_mw.py --dataset.path Assembly --save_dir SAVE_DIRNote that we still specify bc_policy to specify the task name, but we don't use it in baselines.
This is special to train_rl_mw.py.
python mw_main/train_rl_mw.py --config_path release/cfgs/metaworld/rlpd.yaml --bc_policy assembly --use_wb 0For simplicity, here this one command performs both pretraining and RL training.
python mw_main/train_rl_mw.py --config_path release/cfgs/metaworld/rft.yaml --bc_policy assembly --use_wb 0@misc{hu2023imitation,
title={Imitation Bootstrapped Reinforcement Learning},
author={Hengyuan Hu and Suvir Mirchandani and Dorsa Sadigh},
year={2023},
eprint={2311.02198},
archivePrefix={arXiv},
primaryClass={cs.LG}
}