|
|
--- |
|
|
base_model: |
|
|
- Qwen/Qwen3-8B |
|
|
datasets: |
|
|
- OpenThoughts-Agent-v1-SFT |
|
|
- OpenThoughts-Agent-v1-RL |
|
|
library_name: transformers |
|
|
license: apache-2.0 |
|
|
model-index: |
|
|
- name: OpenThinker-Agent-v1 |
|
|
results: [] |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- agents |
|
|
- terminal |
|
|
- code |
|
|
- software-engineering |
|
|
--- |
|
|
|
|
|
<p align="center"> |
|
|
<img src="https://huggingface.co/datasets/open-thoughts/OpenThoughts1-Agent-SFT/resolve/main/ota-logo.png" width="50%"> |
|
|
</p> |
|
|
|
|
|
<p align="center"> |
|
|
<a href="https://www.openthoughts.ai/blog/agent" style="margin-right: 24px;">Project</a> | |
|
|
<a href="https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT" style="margin-right: 24px; margin-left: 24px;">SFT dataset</a> | |
|
|
<a href="https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL" style="margin-right: 24px; margin-left: 24px;">RL dataset</a> | |
|
|
<a href="https://huggingface.co/open-thoughts/OpenThinker-Agent-v1-SFT" style="margin-right: 24px; margin-left: 24px;">SFT model</a> | |
|
|
<a href="https://huggingface.co/open-thoughts/OpenThinker-Agent-v1" style="margin-left: 24px;">RL model</a> |
|
|
</p> |
|
|
|
|
|
|
|
|
# OpenThinker-Agent-v1-SFT |
|
|
|
|
|
**OpenThoughts-Agent** is an open-source effort to curate the best datasets for training agents. Our first release includes [datasets](https://huggingface.co/collections/open-thoughts/openthinker-agent), [models](https://huggingface.co/collections/open-thoughts/openthinker-agent) and our [research codebase](https://github.com/open-thoughts/OpenThoughts-Agent). |
|
|
|
|
|
[OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) is a model trained for agentic tasks such as **Terminal-Bench 2.0** and **SWE-Bench**. |
|
|
|
|
|
The [OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) model is post-trained from [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B). |
|
|
It is SFT-ed on the [OpenThoughts-Agent-v1-SFT](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT) dataset, then RL-ed on the [OpenThoughts-Agent-v1-RL](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL) dataset. |
|
|
|
|
|
This [OpenThinker-Agent-v1-SFT](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1-SFT) model is the model after the SFT stage. For the model after both SFT and RL stages, see [OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1). |
|
|
|
|
|
- **Homepage:** https://www.openthoughts.ai/blog/agent |
|
|
- **Repository:** https://github.com/open-thoughts/OpenThoughts-Agent |
|
|
|
|
|
|
|
|
# OpenThinker-Agent-v1 Model Performance |
|
|
|
|
|
Our [OpenThinker-Agent-v1](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL) model is the state-of-the-art model at its scale on agent benchmarks. |
|
|
|
|
|
| Model | Harness | Terminal-Bench 2.0 | SWE-Bench Verified | OpenThoughts-TB-Dev | |
|
|
| ----------------------------------------------------------------------------------------------- | ------- | ------------------ | --------- | ------------------- | |
|
|
| [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) | Terminus-2 | 0.0 | 0.7 | 5.7 | |
|
|
| **[OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1)** | Terminus-2 | 4.9 | 15.7 | 17.3 | |
|
|
| [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) | Terminus-2 | 1.9 | 5.7 | 10.2 | |
|
|
| [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) | OpenHands | 10.1 | 49.2 | 24.5 | |
|
|
|
|
|
|
|
|
# Data |
|
|
|
|
|
We built [OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) in two stages: **supervised fine-tuning**, followed by **reinforcement learning**. |
|
|
Each stage required its own data pipeline β RL tasks (instructions, environments, and verifiers) and SFT traces from strong teacher agents completing tasks. |
|
|
|
|
|
[OpenThoughts-Agent-v1-SFT](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT) is an SFT trace dataset containing approximately **15,200 traces** drawn from two different data sources we curate: |
|
|
- **nl2bash**: Simple synthetically generated tasks where the agent has to format shell commands effectively |
|
|
- **InferredBugs**: A set of bugs in C# and Java collected by Microsoft that we turned into tasks |
|
|
|
|
|
[OpenThoughts-Agent-v1-RL](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL) is an RL dataset containing ~720 tasks drawn from the **nl2bash verified** dataset. |
|
|
|
|
|
To stabilize training, we built a three-stage filtration pipeline that prunes tasks before they ever hit the learner: |
|
|
|
|
|
1. Bad verifiers filter: drop tasks with flaky or excessively slow verifiers. |
|
|
2. Environment stability: remove tasks whose containers take too long to build or tear down. |
|
|
Optional difficulty filter: discard tasks that even a strong model (GPT-5 Codex) cannot solve in a single pass. |
|
|
|
|
|
|
|
|
### Training hyperparameters |
|
|
|
|
|
The following hyperparameters were used during training: |
|
|
- learning_rate: 4e-05 |
|
|
- train_batch_size: 1 |
|
|
- eval_batch_size: 8 |
|
|
- seed: 42 |
|
|
- distributed_type: multi-GPU |
|
|
- num_devices: 16 |
|
|
- total_train_batch_size: 16 |
|
|
- total_eval_batch_size: 128 |
|
|
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments |
|
|
- lr_scheduler_type: cosine |
|
|
- lr_scheduler_warmup_ratio: 0.1 |
|
|
- num_epochs: 7.0 |
|
|
|
|
|
### Framework versions |
|
|
|
|
|
- Transformers 4.56.0 |
|
|
- Pytorch 2.9.0+cu128 |
|
|
- Datasets 4.4.1 |
|
|
- Tokenizers 0.22.1 |
|
|
|
|
|
|
|
|
# Links |
|
|
- π [OpenThoughts-Agent project page](https://open-thoughts.ai/blog/agent) |
|
|
- π» [OpenThoughts-Agent GitHub repository](https://github.com/open-thoughts/OpenThoughts-Agent) |
|
|
- π§ [OpenThoughts-Agent-v1-SFT dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT) |
|
|
- π§ [OpenThoughts-Agent-v1-RL dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL) |
|
|
- π§ [OpenThoughts-TB-dev dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts-TB-dev) |
|
|
- π€ [OpenThinker-Agent-v1 model](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) |
|
|
- π€ [OpenThinker-Agent-v1-SFT model](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1-SFT) --> this model |
|
|
|
|
|
|
|
|
# Citation |
|
|
``` |
|
|
@misc{openthoughts-agent, |
|
|
author = {Team, OpenThoughts-Agent}, |
|
|
month = Dec, |
|
|
title = {{OpenThoughts-Agent}}, |
|
|
howpublished = {https://open-thoughts.ai/agent}, |
|
|
year = {2025} |
|
|
} |
|
|
``` |