GPT-OSS 20B CUAD 16 bit (Contracts Assistant)

Overview

This repository contains a full 16 bit merged version of a GPT-OSS 20B model that has been LoRA fine tuned on the CUAD (Contract Understanding Atticus Dataset) for contract question answering and clause understanding.

Base model: unsloth/gpt-oss-20b
Original finetune: QLoRA with a LoRA adapter (see SamerGMTM22/gpt-oss-20b-cuad-lora)
This repo: LoRA merged into the base model and saved as standard 16 bit safetensors
Precision: 16 bit (bf16 in practice)
Key use case: contract review assistance, clause question answering, and prototyping legal assistant workflows

This is the model to use if you want a single, stand alone GPT-OSS 20B checkpoint that already includes CUAD finetuning.

Related models

LoRA adapter only (small, 4 bit, Unsloth-based):
SamerGMTM22/gpt-oss-20b-cuad-lora
This repo (full 16 bit merged model):
SamerGMTM22/gpt-oss-20b-cuad-16bit

Use this repo when:

You want a complete model directory you can ship as-is.
You have enough VRAM and disk to host a 20B 16 bit model.
You prefer a single HF model that does not depend on a separate adapter.

Background and journey

The project goal was to build a practical contracts assistant on top of GPT-OSS 20B:

Specialize its behavior on contract language and clause patterns from CUAD.
Keep the strong reasoning capabilities of GPT-OSS 20B.
Produce both a lightweight adapter and a heavyweight merged model.

Key steps and challenges

Dataset access (CUAD)
- The theatticusproject/cuad-qa dataset on Hugging Face uses a Python loader script.
- Newer versions of datasets no longer support trust_remote_code for such scripts.
- Fix:
  - Download the CUAD data.zip directly from the official GitHub repository.
  - Unzip in the training environment.
  - Build a datasets.Dataset from the raw JSON files.
Chat formatting and supervision
- GPT-OSS expects a Harmony-style chat format with system, user, and assistant roles.
- Each CUAD example was converted into:
  - System: "You are a senior legal assistant specializing in contract review."
  - User: "Question: …\n\nContract excerpt:\n…"
  - Assistant: the annotated answer span from CUAD.
- Early use of train_on_responses_only with wrong markers masked all labels and produced zero training loss.
- Inspecting sample input_ids and labels was critical to confirm that only assistant tokens carry labels.
NaN loss and simplification
- NaNs appeared when label masking and configuration were misaligned.
- The final approach uses straightforward supervised finetuning on the assistant answer spans within the GPT-OSS chat template.
16 bit merge and disk constraints
- Merging a 20B model into 16 bit weights is a heavy job: multiple 16+ GB shards.
- Colab sessions with little free disk repeatedly hit disk full errors.
- The successful merge was done in a fresh session with ~75 GB free, in a clean sequence:
  - Load LoRA model from SamerGMTM22/gpt-oss-20b-cuad-lora.
  - Use Unsloth’s save_pretrained_merged(..., save_method="merged_16bit").
  - Upload the resulting directory as SamerGMTM22/gpt-oss-20b-cuad-16bit.

Files in this repo

You will find, among others:

config.json
model-00000-of-00002.safetensors
model-00001-of-00002.safetensors
model-00002-of-00002.safetensors (depending on sharding layout)
model.safetensors.index.json
tokenizer.json
tokenizer_config.json
special_tokens_map.json
chat_template.jinja

Combined, these files represent a complete 16 bit GPT-OSS 20B model, tuned on CUAD, with its tokenizer and chat template.

Intended use

This model is intended for:

Answering natural language questions about contract excerpts
Surfacing relevant clauses for human review
Building prototypes of legal assistants for contract review

Typical questions you might ask:

"What is the initial term of this agreement?"
"Can either party terminate for convenience? Under what conditions?"
"Is there a limitation of liability clause, and how is it structured?"
"Who indemnifies whom, and for which kinds of claims?"

You must always manually verify the model’s answer against the contract text.

Limitations

Domain-limited: tuned only on CUAD, biased toward that style of contract and questions.
No retrieval: does not search documents; you must provide the relevant text.
Hallucinations: like any LLM, it may infer or invent unsupported details.
Not legal advice: cannot replace the judgment of trained attorneys.

Use it as an assistant, not as a decision maker.

How to use this model

Recommended: load with Unsloth in 16 bit mode

Because GPT-OSS support is still evolving, the safest way to load this model today is via Unsloth, which patches GPT-OSS internals appropriately.

Install dependencies:

pip install "torch>=2.1.0" transformers
pip install "unsloth[base] @ git+https://github.com/unslothai/unsloth"
pip install "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo"

Load the model:

from unsloth import FastLanguageModel
import torch

model_id = "SamerGMTM22/gpt-oss-20b-cuad-16bit"

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name     = model_id,
    max_seq_length = 1024,
    dtype          = torch.bfloat16,   # 16 bit compute
    load_in_4bit   = False,            # full 16 bit weights
)

Example: contract Q&A

from transformers import TextStreamer

clause = """
This Agreement shall commence on the Effective Date and shall continue
for a period of three (3) years, unless earlier terminated in accordance
with the provisions herein.
"""

messages = [
    {
        "role": "system",
        "content": "You are a senior legal assistant specializing in contract review.",
    },
    {
        "role": "user",
        "content": (
            "Question: What is the initial term of this agreement?\n\n"
            "Contract excerpt:\n" + clause
        ),
    },
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,
    return_tensors = "pt",
    return_dict = True,
).to(model.device)

_ = model.generate(
    **inputs,
    max_new_tokens = 128,
    streamer       = TextStreamer(tokenizer),
)

The answer should identify the initial term and may restate the relevant sentence.

Note on plain transformers loading

GPT-OSS support in transformers and related tooling is evolving. Depending on versions, direct loading with AutoModelForCausalLM.from_pretrained may or may not work without additional patches.

If you run into router-related errors when using plain AutoModelForCausalLM, prefer the Unsloth loading path until the ecosystem stabilizes.

Safety and responsible use

Do not treat this model’s outputs as legal advice.
Always cross-check its answers against the actual contract text.
Always involve a qualified attorney before acting on its outputs.

This model is best thought of as a specialized research tool for legal and procurement workflows, not as a replacement for professional judgment.

Author and acknowledgements

Finetuning and merging by: @SamerGMTM22
Base model: GPT-OSS 20B via unsloth/gpt-oss-20b
Dataset: CUAD (Contract Understanding Atticus Dataset) by The Atticus Project

Thanks to:

The Atticus Project, for releasing CUAD.
The GPT-OSS team, for open sourcing GPT-OSS 20B.
The Unsloth authors, for making large-model finetuning and export feasible on single GPUs.

Prompt format

Single message; excerpt optional: Question: … Excerpt: … If no excerpt: Question: …

Inference snippet

from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_id = "SamerGMTM22/gpt-oss-20b-cuad-16bit"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto",
)
prompt = (
    "Question: How many days' notice is required to terminate? "
    "Excerpt: This Agreement may be terminated by either party upon thirty (30) days' prior written notice."
)
messages = [
    {"role": "system", "content": "You are a legal assistant answering contract clause questions. Be concise; no chain-of-thought in the output."},
    {"role": "user", "content": prompt},
]
rendered = tok.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tok(rendered, return_tensors="pt").to(model.device)
gen_cfg = GenerationConfig(max_new_tokens=200, do_sample=False)
outputs = model.generate(**inputs, generation_config=gen_cfg)
# Only decode the newly generated tokens
input_length = inputs.input_ids.shape[1]
print(tok.decode(outputs[0][input_length:], skip_special_tokens=True))

Hardware notes

Full 16-bit requires a large GPU (e.g., L40S/H100).
For smaller GPUs, use a quantized variant or the LoRA adapter (load base + adapter; precision/footprint depends on the base).

Repository

How-to and local relay clients: https://github.com/samerGMTM22/GPT-OSS-20B-Contracts

Safety

No chain-of-thought is shown in outputs.
Answers rely on the provided excerpt; if insufficient, the model should say so.

License / Data

Base: GPT-OSS licensing (see openai/gpt-oss-20b).
Training data: CUAD licensing applies.

Downloads last month: 41

Safetensors

Model size

21B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SamerGMTM22/gpt-oss-20b-cuad-16bit

Base model

openai/gpt-oss-20b

Finetuned

(416)

this model

SamerGMTM22
/

gpt-oss-20b-cuad-16bit