PPO Agent playing LunarLander-v2
This is a trained model of a PPO agent playing LunarLander-v2 using the stable-baselines3 library. Training time was less than 4 min on macbook m1 pro
below is the base code which scored ~240, for the final submission, i re-trained the model 2-3 times to get slightly improved result
Usage (with Stable-baselines3)
from stable_baselines3 import ...
from huggingface_sb3 import load_from_hub
def fast_start_decay(initial_value, final_value, power=2): #Power law decay
"""
LR decreases FAST at the beginning, then slowly later.
power > 1 makes the curve drop faster early.
"""
def lr_func(progress_remaining):
# progress_remaining: 1 โ 0
t = 1 - progress_remaining # 0 โ 1
return initial_value - (initial_value - final_value) * (t ** (1/power))
return lr_func
env = make_vec_env('LunarLander-v2', n_envs=8, vec_env_cls=SubprocVecEnv)
model = PPO(
policy="MlpPolicy",
env=env,
n_steps=2048,
batch_size=64,
n_epochs=10,
gamma=0.998,
gae_lambda=0.97,
learning_rate=fast_start_decay(0.001, 0.0001, 2),
clip_range=fast_start_decay(0.8, 0.1, 3),
ent_coef=0.01,
tensorboard_log="./ppo_lunarlander_tb/",
)
model.learn(total_timesteps=int(8e5), tb_log_name="PPO_LunarLander")
...
- Downloads last month
- 68
Evaluation results
- mean_reward on LunarLander-v2self-reported277.16 +/- 20.55