Vishand03 commited on
Commit
ea4774b
·
verified ·
1 Parent(s): 554bfa9

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -0
README.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: numpy
3
+ tags:
4
+ - Taxi-v3
5
+ - reinforcement-learning
6
+ - q-learning
7
+ - custom-implementation
8
+ model-index:
9
+ - name: Q-Learning
10
+ results:
11
+ - task:
12
+ type: reinforcement-learning
13
+ name: reinforcement-learning
14
+ dataset:
15
+ name: Taxi-v3
16
+ type: Taxi-v3
17
+ metrics:
18
+ - type: mean_reward
19
+ name: mean_reward
20
+ value: 7.92 +/- 2.60
21
+ verified: false
22
+ ---
23
+
24
+ # 🚖 Q-Learning Agent for Taxi-v3
25
+
26
+ This is a trained **Q-Learning agent** for the **Taxi-v3** environment using a **tabular approach**.
27
+
28
+ ## Developer
29
+ **Vishand S (@Vishand03)**
30
+
31
+ ## Frameworks
32
+ - Python
33
+ - NumPy
34
+ - Gymnasium
35
+
36
+ ## Training Details
37
+ - Algorithm: Q-Learning
38
+ - Episodes: 2,000,000
39
+ - Max Steps per Episode: 200
40
+ - Learning rate (α): 0.1
41
+ - Discount factor (γ): 0.99
42
+ - Exploration: Epsilon-greedy
43
+ - Epsilon decay: 0.0005
44
+ - Mean Reward: ~7.92 ± 2.60
45
+
46
+ ---
47
+
48
+ ## 🛠 Usage
49
+
50
+ ```python
51
+ import gymnasium as gym
52
+ import pickle
53
+ from huggingface_hub import hf_hub_download
54
+
55
+ # -------------------------
56
+ # Load pretrained model
57
+ # -------------------------
58
+ model_file = hf_hub_download("Vishand03/q-Taxi-v3", "q-learning.pkl")
59
+ with open(model_file, "rb") as f:
60
+ model = pickle.load(f)
61
+
62
+ env = gym.make(model["env_id"])
63
+
64
+ # -------------------------
65
+ # Evaluate agent
66
+ # -------------------------
67
+ def greedy_policy(Qtable, state):
68
+ return max(range(len(Qtable[state])), key=lambda a: Qtable[state][a])
69
+
70
+ total_rewards = []
71
+ for _ in range(model["n_eval_episodes"]):
72
+ state, _ = env.reset()
73
+ done = False
74
+ episode_reward = 0
75
+ while not done:
76
+ action = greedy_policy(model["qtable"], state)
77
+ state, reward, terminated, truncated, _ = env.step(action)
78
+ episode_reward += reward
79
+ done = terminated or truncated
80
+ total_rewards.append(episode_reward)
81
+
82
+ mean_reward = sum(total_rewards) / len(total_rewards)
83
+ print(f"Mean Reward: {mean_reward:.2f}")