FlameF0X commited on
Commit
157ff3c
·
verified ·
1 Parent(s): 21c858b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -1
README.md CHANGED
@@ -1,6 +1,69 @@
1
  ---
2
  license: apache-2.0
 
 
3
  pipeline_tag: reinforcement-learning
4
  tags:
5
  - rlgym
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ ---
4
+ license: apache-2.0
5
  pipeline_tag: reinforcement-learning
6
  tags:
7
  - rlgym
8
+ - rocket-league
9
+ - RLBot
10
+ - PPO
11
+ ---
12
+
13
+ # CanoPy
14
+
15
+ CanoPy is a self-playing reinforcement learning Rocket League agent designed for the `RLBot Championship 2025`.
16
+ It uses PPO (Proximal Policy Optimization) to learn 2v2 gameplay through self-play. The agent is trained to play effectively on both blue and orange teams and can generalize to various team compositions.
17
+
18
+ ## Model Details
19
+
20
+ - **Framework:** RLGym + RLBot v5
21
+ - **Algorithm:** PPO (via `rlgym-ppo`)
22
+ - **Team size:** 2v2
23
+ - **Action repeat:** 8
24
+ - **Observations:** `DefaultObs` with normalized positions, angles, velocities, and boost
25
+ - **Action space:** Lookup table actions with repeat frames
26
+ - **Reward shaping:** Combined reward including:
27
+ - Speed toward ball
28
+ - In-air bonus
29
+ - Ball velocity toward goal
30
+ - Goal scoring reward
31
+
32
+ ## Training Configuration (from `config.json`)
33
+
34
+ - **Number of processes:** 4
35
+ - **Minimum inference ratio:** 80%
36
+ - **Steps per checkpoint:** 1,000,000
37
+ - **PPO batch size:** 100,000
38
+ - **PPO minibatch size:** 50,000
39
+ - **PPO epochs per update:** 2
40
+ - **Experience buffer size:** 300,000
41
+ - **Policy network layers:** [256, 128]
42
+ - **Critic network layers:** [256, 128]
43
+ - **Policy learning rate:** 0.0001
44
+ - **Critic learning rate:** 0.0001
45
+ - **PPO entropy coefficient:** 0.01
46
+ - **Standardize returns:** true
47
+ - **Standardize observations:** false
48
+ - **Total training steps:** 1,000,000,000
49
+ - **Checkpoint directory:** ./checkpoints
50
+
51
+ ## Intended Use
52
+
53
+ CanoPy is intended for research, competition, and experimentation within the RLBot framework. It is designed to compete in the ML bot bracket of the RLBot Championship 2025.
54
+
55
+ ## Limitations
56
+
57
+ - Performance is dependent on training; untrained or partially trained models may perform poorly.
58
+ - The bot has been trained for standard Rocket League 2v2 matches; it may not generalize to unusual map sizes, mutators, or game modes.
59
+ - Does not include human-like strategy beyond what PPO has learned from self-play.
60
+
61
+ ## Evaluation
62
+
63
+ CanoPy can be evaluated using the `evaluate()` function in the training script. Expected evaluation includes average episode returns and gameplay against copies of itself.
64
+ - **Note:** To meet RLBot Championship submission requirements, further testing against Psyonix Pro bots may be necessary.
65
+
66
+ ## Contact / Author
67
+
68
+ - **Author:** FlameF0X /// Discord handler `@flame_f0x`
69
+ - **Competition:** RLBot Championship 2025