Qwen3-0.6B Fine-tuned on Aegis AI Content Safety
Model Description
This is a fine-tuned version of Qwen3-0.6B-Base, optimized for content safety classification and moderation tasks. Qwen3 is a compact yet powerful language model developed by Alibaba Cloud, designed for efficient deployment while maintaining strong performance.
This model was fine-tuned using LoRA (Low-Rank Adaptation) on the NVIDIA Aegis AI Content Safety Dataset 2.0, which contains diverse examples of safe and unsafe content across multiple categories.
Model Details
- Base Model: Qwen/Qwen3-0.6B-Base
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Dataset: NVIDIA Aegis AI Content Safety Dataset 2.0
- Training Samples: 2,000 carefully selected samples
- Language: English
- License: Apache 2.0
Capabilities
- Content safety classification
- Toxic content detection
- Harmful content identification
- Safety-aware text generation
- Content moderation assistance
Intended Use Cases
- Content moderation systems
- Chat application safety filters
- User-generated content screening
- Educational content filtering
- Social media safety monitoring
Training Configuration
LoRA Parameters
- Rank (r): 16
- Alpha: 32
- Dropout: 0.05
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training Hyperparameters
- Learning Rate: 2e-4
- Batch Size: 4 (per device)
- Gradient Accumulation Steps: 4
- Effective Batch Size: 16
- Epochs: 3
- Optimizer: AdamW (8-bit paged)
- LR Scheduler: Cosine with warmup
- Warmup Ratio: 0.1
- FP16 Training: Yes
- Max Sequence Length: 512
Usage
Installation
pip install transformers torch peft
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model_name = "ahczhg/qwen3-0.6b-aegis-safety-lora"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
# Example: Content safety check
prompt = "### Instruction:\nAnalyze this content for safety: 'Your text here'\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=128,
temperature=0.7,
do_sample=True,
top_p=0.95
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Advanced Usage with Pipeline
from transformers import pipeline
# Create text generation pipeline
generator = pipeline(
"text-generation",
model="ahczhg/qwen3-0.6b-aegis-safety-lora",
torch_dtype=torch.float16,
device_map="auto"
)
# Generate safety analysis
result = generator(
"### Instruction:\nIs this content safe? 'Hello, how are you?'\n\n### Response:\n",
max_new_tokens=128,
temperature=0.7,
do_sample=True
)
print(result[0]['generated_text'])
Evaluation
The model was evaluated on a held-out test set from the Aegis AI Content Safety Dataset. Key metrics include:
- Perplexity on validation set
- Content safety classification accuracy
- False positive/negative rates for harmful content detection
Limitations
- The model is primarily trained on English language content
- Performance may vary on domain-specific or highly technical content
- Should be used as part of a comprehensive content moderation system, not as the sole decision-maker
- May require fine-tuning for specific use cases or content domains
- The model's outputs should be reviewed by human moderators for critical applications
Ethical Considerations
- This model is designed to assist in content safety and moderation tasks
- It should not be used to censor legitimate speech or suppress diverse viewpoints
- Decisions about content moderation should involve human oversight
- The model may reflect biases present in the training data
- Users should implement appropriate safeguards and appeal processes
Training Data
The model was fine-tuned on the NVIDIA Aegis AI Content Safety Dataset 2.0, which includes:
- Diverse examples of safe and unsafe content
- Multiple categories of potentially harmful content
- Balanced representation of safe content
- Real-world scenarios and edge cases
Citation
If you use this model in your research or applications, please cite:
@misc{qwen3_0.6b_aegis_safety,
author = {ahczhg},
title = {Qwen3-0.6B Fine-tuned on Aegis AI Content Safety},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/ahczhg/qwen3-0.6b-aegis-safety-lora}},
note = {Fine-tuned on NVIDIA Aegis AI Content Safety Dataset 2.0}
}
Acknowledgments
- Base model by the original authors: Qwen
- Dataset provided by NVIDIA
- Fine-tuning performed using HuggingFace Transformers and PEFT libraries
Contact
For questions, issues, or feedback, please visit the model repository.
Model Card Authors
- ahczhg
Model Card Contact
- Downloads last month
- 31
Model tree for ahczhg/qwen3-0.6b-aegis-safety-lora
Base model
Qwen/Qwen3-0.6B-Base