pytorch-a2c-ppo-acktr
08 Jan 2022tmp
Installation
docker run --gpus all -it --name torch_rl \
-v /home1/irteam/users/seosh/pytorch-a2c-ppo-acktr-gail:/workspace \
--shm-size=8g --ulimit memlock=-1 --ulimit stack=67108864 \
-p 8888:8888 -p 6006:6006 \
--device=/dev/snd \
pytorch/pytorch:1.8.1-cuda11.1-cudnn8-devel
apt-get update && apt-get -y install sudo &&\
sudo apt-get install tree &&\
sudo apt-get install python3.6 &&\
sudo apt-get install curl zip unzip tar &&\
sudo apt install git-all &&\
sudo apt-get install wget &&\
sudo apt-get install vim
echo "alias python=python3" >> ~/.bashrc
. ~/.bashrc
git clone https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail
wget https://repo.anaconda.com/miniconda/Miniconda3-py38_4.10.3-Linux-x86_64.sh &&\
bash Miniconda3-py38_4.10.3-Linux-x86_64.sh &&\
echo "export PATH=~/miniconda3/bin:$PATH" >> ~/.bashrc &&\
. ~/.bashrc &&\
conda update -n base -c defaults conda
cd pytorch-a2c-ppo-acktr-gail &&\
# conda install pytorch torchvision -c soumith &&\
# pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html &&\
pip install -r requirements.txt &&\
conda install -c conda-forge gym-atari
여기서
pytorch-a2c-ppo-acktr-gail$ head -5 requirements.txt
gym<0.20
matplotlib
pybullet
stable-baselines3
h5py
pip uninstall gym &&\
pip install gym==0.19.0 &&\
pip install stable-baselines3
python -c "import gym; print(gym.__version__); print(gym.envs.registry.all())
import gym
env = gym.make('CartPole-v1')
# env is created, now we can use it:
for episode in range(10):
observation = env.reset()
for step in range(50):
action = env.action_space.sample() # or given a custom model, action = policy(observation)
observation, reward, done, info = env.step(action)
print('{} eposiode, {} step'.format(episode+1,step+1))
print('action : {}'.format(action))
print('observation : {}'.format(observation))
print('reward : {}'.format(reward))
print()
10 eposiode, 47 step
action : 0
observation : [-0.62778812 -0.56094966 2.78975412 7.80858407]
reward : 0.0
10 eposiode, 48 step
action : 0
observation : [-0.63900712 -0.72946692 2.9459258 7.6726126 ]
reward : 0.0
10 eposiode, 49 step
action : 0
observation : [-0.65359645 -0.91018776 3.09937806 7.46386373]
reward : 0.0
10 eposiode, 50 step
action : 1
observation : [-0.67180021 -0.71219321 3.24865533 7.77299837]
reward : 0.0
Atari
conda install -c conda-forge gym-atari
python -c "import gym; env = gym.make('CartPole-v1'); print(env)" &&\
python -c "import gym; env = gym.make('MsPacman-v0'); print(env)"
gym.error.DependencyNotInstalled: No module named 'atari_py'. (HINT: you can install Atari dependencies by running 'pip install gym[atari]'.)
pip install gym[atari]
# cmake
sudo apt install cmake
pip install gym[atari]
Mujoco
python -c "import gym; env = gym.make('CartPole-v1'); print(env)" &&\
python -c "import gym; env = gym.make('MsPacman-v0'); print(env)" &&\
python -c "import gym; env = gym.make('Hopper-v2'); print(env)"
Neptune.ai의 [Installing MuJoCo to Work With OpenAI Gym Environments]를 따라했습니다.
wget https://www.roboti.us/download/mjpro150_linux.zip &&\
unzip mjpro150_linux.zip &&\
mkdir -p ~/.mujoco &&\
mv mjpro150 ~/.mujoco/
라이센스는 link
wget https://www.roboti.us/file/mjkey.txt &&\
mv mjkey.txt ~/.mujoco/
pip3 install -U 'mujoco-py<1.50.2,>=1.50.1'
...
Exception:
Missing path to your environment variable.
Current values LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64
Please add following line to .bashrc:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/root/.mujoco/mjpro150/bin
----------------------------------------
ERROR: Failed building wheel for mujoco-py
#vim ~/.bashrc
#export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/root/.mujoco/mjpro150/bin
#source ~/.bashrc
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/root/.mujoco/mjpro150/bin" >> ~/.bashrc
. ~/.bashrc
...
/opt/conda/compiler_compat/ld: cannot find -lGL
collect2: error: ld returned 1 exit status
error: command 'gcc' failed with exit status 1
sudo apt install libgl1-mesa-dev
pip install -U 'mujoco-py<1.50.2,>=1.50.1' &&\
python -c 'import mujoco_py'
python -c "import gym; env = gym.make('CartPole-v1'); print(env)" &&\
python -c "import gym; env = gym.make('MsPacman-v0'); print(env)" &&\
python -c "import gym; env = gym.make('Hopper-v2'); print(env)"
pip install gym[all]==0.19.0
python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())" &&\
python -c "import gym; from stable_baselines3 import PPO; env = gym.make('CartPole-v0'); print(env); model = PPO('MlpPolicy', env,verbose=1); print(model); env.close()"
Test
import gym
from stable_baselines3 import PPO
env = gym.make("CartPole-v1")
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10000)
obs = env.reset()
for i in range(1000):
action, _states = model.predict(obs, deterministic=True)
obs, reward, done, info = env.step(action)
env.render()
if done:
obs = env.reset()
env.close()
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
---------------------------------
| rollout/ | |
| ep_len_mean | 20.1 |
| ep_rew_mean | 20.1 |
| time/ | |
| fps | 663 |
| iterations | 1 |
| time_elapsed | 3 |
| total_timesteps | 2048 |
---------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 25.9 |
| ep_rew_mean | 25.9 |
| time/ | |
| fps | 476 |
| iterations | 2 |
| time_elapsed | 8 |
| total_timesteps | 4096 |
| train/ | |
| approx_kl | 0.008755123 |
| clip_fraction | 0.0964 |
| clip_range | 0.2 |
| entropy_loss | -0.686 |
| explained_variance | -0.000423 |
| learning_rate | 0.0003 |
| loss | 6.32 |
| n_updates | 10 |
| policy_gradient_loss | -0.0139 |
| value_loss | 45.4 |
-----------------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 34.4 |
| ep_rew_mean | 34.4 |
| time/ | |
| fps | 429 |
| iterations | 3 |
| time_elapsed | 14 |
| total_timesteps | 6144 |
| train/ | |
| approx_kl | 0.011211051 |
| clip_fraction | 0.0855 |
| clip_range | 0.2 |
| entropy_loss | -0.666 |
| explained_variance | 0.118 |
| learning_rate | 0.0003 |
| loss | 13.2 |
| n_updates | 20 |
| policy_gradient_loss | -0.0213 |
| value_loss | 31.4 |
-----------------------------------------
...
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/gym/envs/classic_control/rendering.py", line 27, in <module>
from pyglet.gl import *
File "/opt/conda/lib/python3.8/site-packages/pyglet/gl/__init__.py", line 95, in <module>
from pyglet.gl.lib import GLException
File "/opt/conda/lib/python3.8/site-packages/pyglet/gl/lib.py", line 149, in <module>
from pyglet.gl.lib_glx import link_GL, link_GLU, link_GLX
File "/opt/conda/lib/python3.8/site-packages/pyglet/gl/lib_glx.py", line 46, in <module>
glu_lib = pyglet.lib.load_library('GLU')
File "/opt/conda/lib/python3.8/site-packages/pyglet/lib.py", line 164, in load_library
raise ImportError('Library "%s" not found.' % names[0])
ImportError: Library "GLU" not found.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "render_test.py", line 14, in <module>
env.render()
File "/opt/conda/lib/python3.8/site-packages/gym/core.py", line 254, in render
return self.env.render(mode, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/gym/envs/classic_control/cartpole.py", line 179, in render
from gym.envs.classic_control import rendering
File "/opt/conda/lib/python3.8/site-packages/gym/envs/classic_control/rendering.py", line 29, in <module>
raise ImportError(
ImportError:
Error occurred while running `from pyglet.gl import *`
HINT: make sure you have OpenGL installed. On Ubuntu, you can run 'apt-get install python-opengl'.
If you're running on a server, you may need a virtual frame buffer; something like this should work:
'xvfb-run -s "-screen 0 1400x900x24" python <your_script.py>'
Rendering in Remote Machine
강화학습은 ~~ 환경을
- 정해진 step (or epoch) 마다 환경과 상호작용하는 이미지를 저장해서 gif파일로 만들어서 보기
- Jupyter Notebook에서 실행해서 실시간 rendering 하기
- 웹사이트에 rendering 띄우기
- teamviwer로 원격 접속해서 직접 보기
sudo apt-get install freeglut3-dev
Traceback (most recent call last):
File "render_test.py", line 14, in <module>
env.render()
File "/opt/conda/lib/python3.8/site-packages/gym/core.py", line 254, in render
return self.env.render(mode, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/gym/envs/classic_control/cartpole.py", line 179, in render
from gym.envs.classic_control import rendering
File "/opt/conda/lib/python3.8/site-packages/gym/envs/classic_control/rendering.py", line 27, in <module>
from pyglet.gl import *
File "/opt/conda/lib/python3.8/site-packages/pyglet/gl/__init__.py", line 244, in <module>
import pyglet.window
File "/opt/conda/lib/python3.8/site-packages/pyglet/window/__init__.py", line 1880, in <module>
gl._create_shadow_window()
File "/opt/conda/lib/python3.8/site-packages/pyglet/gl/__init__.py", line 220, in _create_shadow_window
_shadow_window = Window(width=1, height=1, visible=False)
File "/opt/conda/lib/python3.8/site-packages/pyglet/window/xlib/__init__.py", line 165, in __init__
super(XlibWindow, self).__init__(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/pyglet/window/__init__.py", line 570, in __init__
display = pyglet.canvas.get_display()
File "/opt/conda/lib/python3.8/site-packages/pyglet/canvas/__init__.py", line 94, in get_display
return Display()
File "/opt/conda/lib/python3.8/site-packages/pyglet/canvas/xlib.py", line 123, in __init__
raise NoSuchDisplayException('Cannot connect to "%s"' % name)
pyglet.canvas.xlib.NoSuchDisplayException: Cannot connect to "None"
stackoverflow의 How to run OpenAI Gym .render() over a server
sudo apt install xvfb &&\
sudo apt install ffmpeg &&\
pip install pyvirtualdisplay
Rendering OpenAI Gym Envs on Binder and Google Colab
Training and Enjoy
python main.py --env-name "Reacher-v2" --num-env-steps 1000000
Updates 12470, num timesteps 997680, FPS 3477
Last 10 training episodes: mean/median reward -25.9/-25.8, min/max reward -29.1/-19.2
Updates 12480, num timesteps 998480, FPS 3477
Last 10 training episodes: mean/median reward -26.1/-25.1, min/max reward -35.7/-21.4
Updates 12490, num timesteps 999280, FPS 3477
Last 10 training episodes: mean/median reward -26.4/-26.3, min/max reward -30.2/-17.1
python enjoy.py --load-dir trained_models/a2c --env-name "Reacher-v2"