sedrickkeh commited on
Commit
e195b57
·
verified ·
1 Parent(s): 291e0c5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -15
README.md CHANGED
@@ -11,26 +11,80 @@ model-index:
11
  results: []
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
- # nl2bash-bugsseq
18
 
19
- This model is a fine-tuned version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) on the penfever/GLM-4.6-nl2bash-verified-32eps-32k and the penfever/GLM-4.6-inferredbugs-32eps-65k datasets.
 
20
 
21
- ## Model description
22
 
23
- More information needed
 
24
 
25
- ## Intended uses & limitations
26
 
27
- More information needed
28
 
29
- ## Training and evaluation data
30
 
31
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
- ## Training procedure
34
 
35
  ### Training hyperparameters
36
 
@@ -48,13 +102,31 @@ The following hyperparameters were used during training:
48
  - lr_scheduler_warmup_ratio: 0.1
49
  - num_epochs: 7.0
50
 
51
- ### Training results
52
-
53
-
54
-
55
  ### Framework versions
56
 
57
  - Transformers 4.56.0
58
  - Pytorch 2.9.0+cu128
59
  - Datasets 4.4.1
60
  - Tokenizers 0.22.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  results: []
12
  ---
13
 
14
+ ---
15
+ base_model:
16
+ - Qwen/Qwen3-8B
17
+ datasets:
18
+ - OpenThoughts-Agent-v1-SFT
19
+ - OpenThoughts-Agent-v1-RL
20
+ library_name: transformers
21
+ license: apache-2.0
22
+ model-index:
23
+ - name: OpenThinker-Agent-v1
24
+ results: []
25
+ pipeline_tag: text-generation
26
+ tags:
27
+ - agents
28
+ - terminal
29
+ - code
30
+ - software-engineering
31
+ ---
32
+
33
+ <p align="center">
34
+ <img src="https://huggingface.co/datasets/open-thoughts/OpenThoughts1-Agent-SFT/resolve/main/ota-logo.png" width="50%">
35
+ </p>
36
+
37
+ <p align="center">
38
+ <a href="https://open-thoughts.ai/agent" style="margin-right: 24px;">project</a> |
39
+ <a href="https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT" style="margin-right: 24px; margin-left: 24px;">SFT dataset</a> |
40
+ <a href="https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL" style="margin-right: 24px; margin-left: 24px;">RL dataset</a> |
41
+ <a href="https://huggingface.co/open-thoughts/OpenThinker-Agent-v1-SFT" style="margin-right: 24px; margin-left: 24px;">SFT model</a> |
42
+ <a href="https://huggingface.co/open-thoughts/OpenThinker-Agent-v1" style="margin-left: 24px;">RL model</a>
43
+ </p>
44
+
45
+
46
+ # OpenThinker-Agent-v1
47
+
48
+ **OpenThoughts-Agent** is an open-source effort to curate the best datasets for training agents. Our first release includes [datasets](https://huggingface.co/collections/open-thoughts/openthinker-agent), [models](https://huggingface.co/collections/open-thoughts/openthinker-agent) and our research codebase.
49
 
50
+ [OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) is a model trained for agentic tasks such as **Terminal-Bench 2.0** and **SWE-Bench**.
51
 
52
+ The [OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) model is post-trained from [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B).
53
+ It is SFT-ed on the [OpenThoughts-Agent-v1-SFT](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT) dataset, then RL-ed on the [OpenThoughts-Agent-v1-RL](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL) dataset.
54
 
55
+ This model is the model after the SFT stage. For the model after both SFT and RL stages, see [OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1).
56
 
57
+ - **Homepage:** https://www.open-thoughts.ai/agent
58
+ - **Repository:** https://github.com/open-thoughts/OpenThoughts-Agent
59
 
 
60
 
61
+ # OpenThinker-Agent-v1 Model Performance
62
 
63
+ Our [OpenThinker-Agent-v1](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL) model is the state-of-the-art model at its scale on agent benchmarks.
64
 
65
+ | Model | Terminal-Bench 2.0 | SWE-Bench | OpenThoughts-TB-Dev |
66
+ | ----------------------------------------------------------------------------------------------- | ------------------ | --------- | ------------------- |
67
+ | **[OpenThinker-Agent-v1](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL)** | 4.9 | 15.7 | 17.3 |
68
+ | [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) | 10.1 | 51.6 | 24.5 |
69
+
70
+
71
+ # Data
72
+
73
+ We built [OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) in two stages: **supervised fine-tuning**, followed by **reinforcement learning**.
74
+ Each stage required its own data pipeline – RL tasks (instructions, environments, and verifiers) and SFT traces from strong teacher agents completing tasks.
75
+
76
+ [OpenThoughts-Agent-v1-SFT](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT) is an SFT trace dataset containing approximately **15,200 traces** drawn from two different data sources we curate:
77
+ - **nl2bash**: Simple synthetically generated tasks where the agent has to format shell commands effectively
78
+ - **InferredBugs**: A set of bugs in C# and Java collected by Microsoft that we turned into tasks
79
+
80
+ [OpenThoughts-Agent-v1-RL](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL) is an RL dataset containing ~720 tasks drawn from the **nl2bash verified** dataset.
81
+
82
+ To stabilize training, we built a three-stage filtration pipeline that prunes tasks before they ever hit the learner:
83
+
84
+ 1. Bad verifiers filter: drop tasks with flaky or excessively slow verifiers.
85
+ 2. Environment stability: remove tasks whose containers take too long to build or tear down.
86
+ Optional difficulty filter: discard tasks that even a strong model (GPT-5 Codex) cannot solve in a single pass.
87
 
 
88
 
89
  ### Training hyperparameters
90
 
 
102
  - lr_scheduler_warmup_ratio: 0.1
103
  - num_epochs: 7.0
104
 
 
 
 
 
105
  ### Framework versions
106
 
107
  - Transformers 4.56.0
108
  - Pytorch 2.9.0+cu128
109
  - Datasets 4.4.1
110
  - Tokenizers 0.22.1
111
+
112
+
113
+ # Links
114
+ - 🌐 [OpenThoughts-Agent Project Page](https://open-thoughts.ai/agent)
115
+ - 💻 [OpenThoughts-Agent GitHub Repository](https://github.com/open-thoughts/OpenThoughts-Agent)
116
+ - 🧠 [OpenThoughts-Agent-v1-SFT dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT)
117
+ - 🧠 [OpenThoughts-Agent-v1-RL dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL)
118
+ - 🧠 [OpenThoughts-TB-dev dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts-TB-dev)
119
+ - 🤖 [OpenThinker-Agent-v1 model](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1)
120
+ - 🤖 [OpenThinker-Agent-v1-SFT model](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1-SFT) --> this model
121
+
122
+
123
+ # Citation
124
+ ```
125
+ @misc{openthoughts-agent,
126
+ author = {Team, OpenThoughts-Agent},
127
+ month = Dec,
128
+ title = {{OpenThoughts-Agent}},
129
+ howpublished = {https://open-thoughts.ai/agent},
130
+ year = {2025}
131
+ }
132
+ ```