GPT-OSS 20B CUAD 16 bit (Contracts Assistant)

Overview

This repository contains a full 16 bit merged version of a GPT-OSS 20B model that has been LoRA fine tuned on the CUAD (Contract Understanding Atticus Dataset) for contract question answering and clause understanding.

  • Base model: unsloth/gpt-oss-20b
  • Original finetune: QLoRA with a LoRA adapter (see SamerGMTM22/gpt-oss-20b-cuad-lora)
  • This repo: LoRA merged into the base model and saved as standard 16 bit safetensors
  • Precision: 16 bit (bf16 in practice)
  • Key use case: contract review assistance, clause question answering, and prototyping legal assistant workflows

This is the model to use if you want a single, stand alone GPT-OSS 20B checkpoint that already includes CUAD finetuning.


Related models

  • LoRA adapter only (small, 4 bit, Unsloth-based):
    SamerGMTM22/gpt-oss-20b-cuad-lora

  • This repo (full 16 bit merged model):
    SamerGMTM22/gpt-oss-20b-cuad-16bit

Use this repo when:

  • You want a complete model directory you can ship as-is.
  • You have enough VRAM and disk to host a 20B 16 bit model.
  • You prefer a single HF model that does not depend on a separate adapter.

Background and journey

The project goal was to build a practical contracts assistant on top of GPT-OSS 20B:

  • Specialize its behavior on contract language and clause patterns from CUAD.
  • Keep the strong reasoning capabilities of GPT-OSS 20B.
  • Produce both a lightweight adapter and a heavyweight merged model.

Key steps and challenges

  1. Dataset access (CUAD)

    • The theatticusproject/cuad-qa dataset on Hugging Face uses a Python loader script.
    • Newer versions of datasets no longer support trust_remote_code for such scripts.
    • Fix:
      • Download the CUAD data.zip directly from the official GitHub repository.
      • Unzip in the training environment.
      • Build a datasets.Dataset from the raw JSON files.
  2. Chat formatting and supervision

    • GPT-OSS expects a Harmony-style chat format with system, user, and assistant roles.
    • Each CUAD example was converted into:
      • System: "You are a senior legal assistant specializing in contract review."
      • User: "Question: …\n\nContract excerpt:\n…"
      • Assistant: the annotated answer span from CUAD.
    • Early use of train_on_responses_only with wrong markers masked all labels and produced zero training loss.
    • Inspecting sample input_ids and labels was critical to confirm that only assistant tokens carry labels.
  3. NaN loss and simplification

    • NaNs appeared when label masking and configuration were misaligned.
    • The final approach uses straightforward supervised finetuning on the assistant answer spans within the GPT-OSS chat template.
  4. 16 bit merge and disk constraints

    • Merging a 20B model into 16 bit weights is a heavy job: multiple 16+ GB shards.
    • Colab sessions with little free disk repeatedly hit disk full errors.
    • The successful merge was done in a fresh session with ~75 GB free, in a clean sequence:
      • Load LoRA model from SamerGMTM22/gpt-oss-20b-cuad-lora.
      • Use Unsloth’s save_pretrained_merged(..., save_method="merged_16bit").
      • Upload the resulting directory as SamerGMTM22/gpt-oss-20b-cuad-16bit.

Files in this repo

You will find, among others:

  • config.json
  • model-00000-of-00002.safetensors
  • model-00001-of-00002.safetensors
  • model-00002-of-00002.safetensors (depending on sharding layout)
  • model.safetensors.index.json
  • tokenizer.json
  • tokenizer_config.json
  • special_tokens_map.json
  • chat_template.jinja

Combined, these files represent a complete 16 bit GPT-OSS 20B model, tuned on CUAD, with its tokenizer and chat template.


Intended use

This model is intended for:

  • Answering natural language questions about contract excerpts
  • Surfacing relevant clauses for human review
  • Building prototypes of legal assistants for contract review

Typical questions you might ask:

  • "What is the initial term of this agreement?"
  • "Can either party terminate for convenience? Under what conditions?"
  • "Is there a limitation of liability clause, and how is it structured?"
  • "Who indemnifies whom, and for which kinds of claims?"

You must always manually verify the model’s answer against the contract text.


Limitations

  • Domain-limited: tuned only on CUAD, biased toward that style of contract and questions.
  • No retrieval: does not search documents; you must provide the relevant text.
  • Hallucinations: like any LLM, it may infer or invent unsupported details.
  • Not legal advice: cannot replace the judgment of trained attorneys.

Use it as an assistant, not as a decision maker.


How to use this model

Recommended: load with Unsloth in 16 bit mode

Because GPT-OSS support is still evolving, the safest way to load this model today is via Unsloth, which patches GPT-OSS internals appropriately.

Install dependencies:

pip install "torch>=2.1.0" transformers
pip install "unsloth[base] @ git+https://github.com/unslothai/unsloth"
pip install "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo"

Load the model:

from unsloth import FastLanguageModel
import torch

model_id = "SamerGMTM22/gpt-oss-20b-cuad-16bit"

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name     = model_id,
    max_seq_length = 1024,
    dtype          = torch.bfloat16,   # 16 bit compute
    load_in_4bit   = False,            # full 16 bit weights
)

Example: contract Q&A

from transformers import TextStreamer

clause = """
This Agreement shall commence on the Effective Date and shall continue
for a period of three (3) years, unless earlier terminated in accordance
with the provisions herein.
"""

messages = [
    {
        "role": "system",
        "content": "You are a senior legal assistant specializing in contract review.",
    },
    {
        "role": "user",
        "content": (
            "Question: What is the initial term of this agreement?\n\n"
            "Contract excerpt:\n" + clause
        ),
    },
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,
    return_tensors = "pt",
    return_dict = True,
).to(model.device)

_ = model.generate(
    **inputs,
    max_new_tokens = 128,
    streamer       = TextStreamer(tokenizer),
)

The answer should identify the initial term and may restate the relevant sentence.

Note on plain transformers loading

GPT-OSS support in transformers and related tooling is evolving. Depending on versions, direct loading with AutoModelForCausalLM.from_pretrained may or may not work without additional patches.

If you run into router-related errors when using plain AutoModelForCausalLM, prefer the Unsloth loading path until the ecosystem stabilizes.


Safety and responsible use

  • Do not treat this model’s outputs as legal advice.
  • Always cross-check its answers against the actual contract text.
  • Always involve a qualified attorney before acting on its outputs.

This model is best thought of as a specialized research tool for legal and procurement workflows, not as a replacement for professional judgment.


Author and acknowledgements

  • Finetuning and merging by: @SamerGMTM22
  • Base model: GPT-OSS 20B via unsloth/gpt-oss-20b
  • Dataset: CUAD (Contract Understanding Atticus Dataset) by The Atticus Project

Thanks to:

  • The Atticus Project, for releasing CUAD.
  • The GPT-OSS team, for open sourcing GPT-OSS 20B.
  • The Unsloth authors, for making large-model finetuning and export feasible on single GPUs.

Prompt format

Single message; excerpt optional: Question: … Excerpt: … If no excerpt: Question: …

Inference snippet

from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_id = "SamerGMTM22/gpt-oss-20b-cuad-16bit"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto",
)
prompt = (
    "Question: How many days' notice is required to terminate? "
    "Excerpt: This Agreement may be terminated by either party upon thirty (30) days' prior written notice."
)
messages = [
    {"role": "system", "content": "You are a legal assistant answering contract clause questions. Be concise; no chain-of-thought in the output."},
    {"role": "user", "content": prompt},
]
rendered = tok.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tok(rendered, return_tensors="pt").to(model.device)
gen_cfg = GenerationConfig(max_new_tokens=200, do_sample=False)
outputs = model.generate(**inputs, generation_config=gen_cfg)
# Only decode the newly generated tokens
input_length = inputs.input_ids.shape[1]
print(tok.decode(outputs[0][input_length:], skip_special_tokens=True))

Hardware notes

  • Full 16-bit requires a large GPU (e.g., L40S/H100).
  • For smaller GPUs, use a quantized variant or the LoRA adapter (load base + adapter; precision/footprint depends on the base).

Repository

How-to and local relay clients: https://github.com/samerGMTM22/GPT-OSS-20B-Contracts

Safety

  • No chain-of-thought is shown in outputs.
  • Answers rely on the provided excerpt; if insufficient, the model should say so.

License / Data

  • Base: GPT-OSS licensing (see openai/gpt-oss-20b).
  • Training data: CUAD licensing applies.
Downloads last month
41
Safetensors
Model size
21B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SamerGMTM22/gpt-oss-20b-cuad-16bit

Base model

openai/gpt-oss-20b
Finetuned
(416)
this model

Dataset used to train SamerGMTM22/gpt-oss-20b-cuad-16bit