Qwen3-0.6B Fine-tuned on Aegis AI Content Safety

Model Description

This is a fine-tuned version of Qwen3-0.6B-Base, optimized for content safety classification and moderation tasks. Qwen3 is a compact yet powerful language model developed by Alibaba Cloud, designed for efficient deployment while maintaining strong performance.

This model was fine-tuned using LoRA (Low-Rank Adaptation) on the NVIDIA Aegis AI Content Safety Dataset 2.0, which contains diverse examples of safe and unsafe content across multiple categories.

Model Details

Base Model: Qwen/Qwen3-0.6B-Base
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Dataset: NVIDIA Aegis AI Content Safety Dataset 2.0
Training Samples: 2,000 carefully selected samples
Language: English
License: Apache 2.0

Capabilities

Content safety classification
Toxic content detection
Harmful content identification
Safety-aware text generation
Content moderation assistance

Intended Use Cases

Content moderation systems
Chat application safety filters
User-generated content screening
Educational content filtering
Social media safety monitoring

Training Configuration

LoRA Parameters

Rank (r): 16
Alpha: 32
Dropout: 0.05
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Training Hyperparameters

Learning Rate: 2e-4
Batch Size: 4 (per device)
Gradient Accumulation Steps: 4
Effective Batch Size: 16
Epochs: 3
Optimizer: AdamW (8-bit paged)
LR Scheduler: Cosine with warmup
Warmup Ratio: 0.1
FP16 Training: Yes
Max Sequence Length: 512

Usage

Installation

pip install transformers torch peft

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "ahczhg/qwen3-0.6b-aegis-safety-lora"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Example: Content safety check
prompt = "### Instruction:\nAnalyze this content for safety: 'Your text here'\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=128,
        temperature=0.7,
        do_sample=True,
        top_p=0.95
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Advanced Usage with Pipeline

from transformers import pipeline

# Create text generation pipeline
generator = pipeline(
    "text-generation",
    model="ahczhg/qwen3-0.6b-aegis-safety-lora",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Generate safety analysis
result = generator(
    "### Instruction:\nIs this content safe? 'Hello, how are you?'\n\n### Response:\n",
    max_new_tokens=128,
    temperature=0.7,
    do_sample=True
)

print(result[0]['generated_text'])

Evaluation

The model was evaluated on a held-out test set from the Aegis AI Content Safety Dataset. Key metrics include:

Perplexity on validation set
Content safety classification accuracy
False positive/negative rates for harmful content detection

Limitations

The model is primarily trained on English language content
Performance may vary on domain-specific or highly technical content
Should be used as part of a comprehensive content moderation system, not as the sole decision-maker
May require fine-tuning for specific use cases or content domains
The model's outputs should be reviewed by human moderators for critical applications

Ethical Considerations

This model is designed to assist in content safety and moderation tasks
It should not be used to censor legitimate speech or suppress diverse viewpoints
Decisions about content moderation should involve human oversight
The model may reflect biases present in the training data
Users should implement appropriate safeguards and appeal processes

Training Data

The model was fine-tuned on the NVIDIA Aegis AI Content Safety Dataset 2.0, which includes:

Diverse examples of safe and unsafe content
Multiple categories of potentially harmful content
Balanced representation of safe content
Real-world scenarios and edge cases

Citation

If you use this model in your research or applications, please cite:

@misc{qwen3_0.6b_aegis_safety,
  author = {ahczhg},
  title = {Qwen3-0.6B Fine-tuned on Aegis AI Content Safety},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/ahczhg/qwen3-0.6b-aegis-safety-lora}},
  note = {Fine-tuned on NVIDIA Aegis AI Content Safety Dataset 2.0}
}

Acknowledgments

Base model by the original authors: Qwen
Dataset provided by NVIDIA
Fine-tuning performed using HuggingFace Transformers and PEFT libraries

Contact

For questions, issues, or feedback, please visit the model repository.

Model Card Authors

ahczhg

Model Card Contact

https://huggingface.co/ahczhg

Downloads last month: 31

Safetensors

Model size

0.6B params

Tensor type

F16

Model tree for ahczhg/qwen3-0.6b-aegis-safety-lora

Base model

Qwen/Qwen3-0.6B-Base

Adapter

(29)

this model

ahczhg
/

qwen3-0.6b-aegis-safety-lora