GPT-OSS 20B CUAD 16 bit (Contracts Assistant)
Overview
This repository contains a full 16 bit merged version of a GPT-OSS 20B model that has been LoRA fine tuned on the CUAD (Contract Understanding Atticus Dataset) for contract question answering and clause understanding.
- Base model:
unsloth/gpt-oss-20b - Original finetune: QLoRA with a LoRA adapter (see
SamerGMTM22/gpt-oss-20b-cuad-lora) - This repo: LoRA merged into the base model and saved as standard 16 bit safetensors
- Precision: 16 bit (bf16 in practice)
- Key use case: contract review assistance, clause question answering, and prototyping legal assistant workflows
This is the model to use if you want a single, stand alone GPT-OSS 20B checkpoint that already includes CUAD finetuning.
Related models
LoRA adapter only (small, 4 bit, Unsloth-based):
SamerGMTM22/gpt-oss-20b-cuad-loraThis repo (full 16 bit merged model):
SamerGMTM22/gpt-oss-20b-cuad-16bit
Use this repo when:
- You want a complete model directory you can ship as-is.
- You have enough VRAM and disk to host a 20B 16 bit model.
- You prefer a single HF model that does not depend on a separate adapter.
Background and journey
The project goal was to build a practical contracts assistant on top of GPT-OSS 20B:
- Specialize its behavior on contract language and clause patterns from CUAD.
- Keep the strong reasoning capabilities of GPT-OSS 20B.
- Produce both a lightweight adapter and a heavyweight merged model.
Key steps and challenges
Dataset access (CUAD)
- The
theatticusproject/cuad-qadataset on Hugging Face uses a Python loader script. - Newer versions of
datasetsno longer supporttrust_remote_codefor such scripts. - Fix:
- Download the CUAD
data.zipdirectly from the official GitHub repository. - Unzip in the training environment.
- Build a
datasets.Datasetfrom the raw JSON files.
- Download the CUAD
- The
Chat formatting and supervision
- GPT-OSS expects a Harmony-style chat format with system, user, and assistant roles.
- Each CUAD example was converted into:
- System: "You are a senior legal assistant specializing in contract review."
- User: "Question: …\n\nContract excerpt:\n…"
- Assistant: the annotated answer span from CUAD.
- Early use of
train_on_responses_onlywith wrong markers masked all labels and produced zero training loss. - Inspecting sample
input_idsandlabelswas critical to confirm that only assistant tokens carry labels.
NaN loss and simplification
- NaNs appeared when label masking and configuration were misaligned.
- The final approach uses straightforward supervised finetuning on the assistant answer spans within the GPT-OSS chat template.
16 bit merge and disk constraints
- Merging a 20B model into 16 bit weights is a heavy job: multiple 16+ GB shards.
- Colab sessions with little free disk repeatedly hit disk full errors.
- The successful merge was done in a fresh session with ~75 GB free, in a clean sequence:
- Load LoRA model from
SamerGMTM22/gpt-oss-20b-cuad-lora. - Use Unsloth’s
save_pretrained_merged(..., save_method="merged_16bit"). - Upload the resulting directory as
SamerGMTM22/gpt-oss-20b-cuad-16bit.
- Load LoRA model from
Files in this repo
You will find, among others:
config.jsonmodel-00000-of-00002.safetensorsmodel-00001-of-00002.safetensorsmodel-00002-of-00002.safetensors(depending on sharding layout)model.safetensors.index.jsontokenizer.jsontokenizer_config.jsonspecial_tokens_map.jsonchat_template.jinja
Combined, these files represent a complete 16 bit GPT-OSS 20B model, tuned on CUAD, with its tokenizer and chat template.
Intended use
This model is intended for:
- Answering natural language questions about contract excerpts
- Surfacing relevant clauses for human review
- Building prototypes of legal assistants for contract review
Typical questions you might ask:
- "What is the initial term of this agreement?"
- "Can either party terminate for convenience? Under what conditions?"
- "Is there a limitation of liability clause, and how is it structured?"
- "Who indemnifies whom, and for which kinds of claims?"
You must always manually verify the model’s answer against the contract text.
Limitations
- Domain-limited: tuned only on CUAD, biased toward that style of contract and questions.
- No retrieval: does not search documents; you must provide the relevant text.
- Hallucinations: like any LLM, it may infer or invent unsupported details.
- Not legal advice: cannot replace the judgment of trained attorneys.
Use it as an assistant, not as a decision maker.
How to use this model
Recommended: load with Unsloth in 16 bit mode
Because GPT-OSS support is still evolving, the safest way to load this model today is via Unsloth, which patches GPT-OSS internals appropriately.
Install dependencies:
pip install "torch>=2.1.0" transformers
pip install "unsloth[base] @ git+https://github.com/unslothai/unsloth"
pip install "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo"
Load the model:
from unsloth import FastLanguageModel
import torch
model_id = "SamerGMTM22/gpt-oss-20b-cuad-16bit"
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = model_id,
max_seq_length = 1024,
dtype = torch.bfloat16, # 16 bit compute
load_in_4bit = False, # full 16 bit weights
)
Example: contract Q&A
from transformers import TextStreamer
clause = """
This Agreement shall commence on the Effective Date and shall continue
for a period of three (3) years, unless earlier terminated in accordance
with the provisions herein.
"""
messages = [
{
"role": "system",
"content": "You are a senior legal assistant specializing in contract review.",
},
{
"role": "user",
"content": (
"Question: What is the initial term of this agreement?\n\n"
"Contract excerpt:\n" + clause
),
},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt = True,
return_tensors = "pt",
return_dict = True,
).to(model.device)
_ = model.generate(
**inputs,
max_new_tokens = 128,
streamer = TextStreamer(tokenizer),
)
The answer should identify the initial term and may restate the relevant sentence.
Note on plain transformers loading
GPT-OSS support in transformers and related tooling is evolving. Depending on versions, direct loading with AutoModelForCausalLM.from_pretrained may or may not work without additional patches.
If you run into router-related errors when using plain AutoModelForCausalLM, prefer the Unsloth loading path until the ecosystem stabilizes.
Safety and responsible use
- Do not treat this model’s outputs as legal advice.
- Always cross-check its answers against the actual contract text.
- Always involve a qualified attorney before acting on its outputs.
This model is best thought of as a specialized research tool for legal and procurement workflows, not as a replacement for professional judgment.
Author and acknowledgements
- Finetuning and merging by: @SamerGMTM22
- Base model: GPT-OSS 20B via
unsloth/gpt-oss-20b - Dataset: CUAD (Contract Understanding Atticus Dataset) by The Atticus Project
Thanks to:
- The Atticus Project, for releasing CUAD.
- The GPT-OSS team, for open sourcing GPT-OSS 20B.
- The Unsloth authors, for making large-model finetuning and export feasible on single GPUs.
Prompt format
Single message; excerpt optional: Question: … Excerpt: … If no excerpt: Question: …
Inference snippet
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
model_id = "SamerGMTM22/gpt-oss-20b-cuad-16bit"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype="auto",
device_map="auto",
)
prompt = (
"Question: How many days' notice is required to terminate? "
"Excerpt: This Agreement may be terminated by either party upon thirty (30) days' prior written notice."
)
messages = [
{"role": "system", "content": "You are a legal assistant answering contract clause questions. Be concise; no chain-of-thought in the output."},
{"role": "user", "content": prompt},
]
rendered = tok.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tok(rendered, return_tensors="pt").to(model.device)
gen_cfg = GenerationConfig(max_new_tokens=200, do_sample=False)
outputs = model.generate(**inputs, generation_config=gen_cfg)
# Only decode the newly generated tokens
input_length = inputs.input_ids.shape[1]
print(tok.decode(outputs[0][input_length:], skip_special_tokens=True))
Hardware notes
- Full 16-bit requires a large GPU (e.g., L40S/H100).
- For smaller GPUs, use a quantized variant or the LoRA adapter (load base + adapter; precision/footprint depends on the base).
Repository
How-to and local relay clients: https://github.com/samerGMTM22/GPT-OSS-20B-Contracts
Safety
- No chain-of-thought is shown in outputs.
- Answers rely on the provided excerpt; if insufficient, the model should say so.
License / Data
- Base: GPT-OSS licensing (see openai/gpt-oss-20b).
- Training data: CUAD licensing applies.
- Downloads last month
- 41
Model tree for SamerGMTM22/gpt-oss-20b-cuad-16bit
Base model
openai/gpt-oss-20b