-
Notifications
You must be signed in to change notification settings - Fork 578
Description
❓ Question
I installed stable-baseline3 and rl-baselines3-zoo. I install minigrid(3.0.0) with pip install minigrid.
I do the training with python train.py --algo ppo --env MiniGrid-Empty-Random-5x5-v0 --eval-freq 10000 --eval-episodes 10 --n-eval-envs 1
The config are
MiniGrid-Empty-Random-5x5-v0: &minigrid-defaults
env_wrapper: minigrid.wrappers.FlatObsWrapper # See GH/1320#issuecomment-1421108191
normalize: true
n_envs: 8 # number of environment copies running in parallel
n_timesteps: !!float 1e5
policy: 'MlpPolicy'
n_steps: 128 # batch size is n_steps * n_env
batch_size: 64 # Number of training minibatches per update
gae_lambda: 0.95 # Factor for trade-off of bias vs variance for Generalized Advantage Estimator
gamma: 0.99
n_epochs: 10 # Number of epoch when optimizing the surrogate
ent_coef: 0.0 # Entropy coefficient for the loss calculation
learning_rate: 2.5e-4 # The learning rate, it can be a function
clip_range: 0.2 # Clipping parameter, it can be a function
I dont change anything else. But the training curve is awkward:
Anyone has thoughts why this happen?
Checklist
- I have checked that there is no similar issue in the repo
- I have read the SB3 documentation
- I have read the RL Zoo documentation
- If code there is, it is minimal and working
- If code there is, it is formatted using the markdown code blocks for both code and stack traces.