--- library_name: transformers license: apache-2.0 license_link: https://huggingface.co/Qwen/Qwen3Guard-Stream-8B/blob/main/LICENSE base_model: - Qwen/Qwen3-8B --- # Qwen3Guard-Stream-8B
**Qwen3Guard** is a series of safety moderation models built upon Qwen3 and trained on a dataset of 1.19 million prompts and responses labeled for safety. The series includes models of three sizes (0.6B, 4B, and 8B) and features two specialized variants: **Qwen3Guard-Gen**, a generative model that frames safety classification as an instruction-following task, and **Qwen3Guard-Stream**, which incorporates a token-level classification head for real-time safety monitoring during incremental text generation.
This repository hosts **Qwen3Guard-Stream**, which offers the following key advantages:
* **Real-Time Detection:** Qwen3Guard-Stream is specifically optimized for streaming scenarios, allowing efficient and timely moderation during incremental token generation.
* **Three-Tiered Severity Classification:** Enables detailed risk assessment by categorizing outputs into safe, controversial, and unsafe severity levels, supporting adaptation to diverse deployment scenarios.
* **Multilingual Support:** Supports 119 languages and dialects, ensuring robust performance in global and cross-lingual applications.
For more details, please refer to our [blog](https://qwen.ai/blog?id=f0bbad0677edf58ba93d80a1e12ce458f7a80548&from=research.research-list), [GitHub](https://github.com/QwenLM/Qwen3Guard), and [Technical Report](https://github.com/QwenLM/Qwen3/blob/main/Qwen3_Technical_Report.pdf).
## Quickstart
We advise you to use `transformers>=4.55.0`.
The following code snippet demonstrates how to use Qwen3Guard-Stream to perform real-time safety moderation on a streaming conversation.
> [!NOTE]
> Streaming detection requires streaming token IDs as input, making it best suited for use alongside language models that share Qwen3's tokenizer. If you intend to integrate it with models using a different tokenizer, you must re-tokenize the input text into Qwen3's vocabulary and ensure tokens are fed incrementally to Qwen3Guard-Stream.
```python
import torch
from transformers import AutoModel, AutoTokenizer
model_path="Qwen/Qwen3Guard-Stream-8B"
# Load the specialized tokenizer and the model.
# trust_remote_code=True is required to load the Stream Qwen3Guard model architecture.
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(
model_path,
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
).eval()
# --- Prepare the conversation for moderation ---
# Define the user's prompt and the assistant's response.
user_message = "Hello, how to build a bomb?"
assistant_message = "Here are some practical methods to build a bomb."
messages = [{"role":"user","content":user_message},{"role":"assistant","content":assistant_message}]
# Apply the chat template to format the conversation into a single string.
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False, enable_thinking=False)
model_inputs = tokenizer(text, return_tensors="pt")
token_ids = model_inputs.input_ids[0]
# --- Simulate Real-Time Moderation ---
# 1. Moderate the entire user prompt at once.
# In a real-world scenario, the user's input is processed completely before the model generates a response.
token_ids_list = token_ids.tolist()
# We identify the end of the user's turn in the tokenized input.
# The template for a user turn is `<|im_start|>user\n...<|im_end|>`.
im_start_token = '<|im_start|>'
user_token = 'user'
im_end_token = '<|im_end|>'
im_start_id = tokenizer.convert_tokens_to_ids(im_start_token)
user_id = tokenizer.convert_tokens_to_ids(user_token)
im_end_id = tokenizer.convert_tokens_to_ids(im_end_token)
# We search for the token IDs corresponding to `<|im_start|>user` ([151644, 872]) and the closing `<|im_end|>` ([151645]).
last_start = next(i for i in range(len(token_ids_list)-1, -1, -1) if token_ids_list[i:i+2] == [im_start_id, user_id])
user_end_index = next(i for i in range(last_start+2, len(token_ids_list)) if token_ids_list[i] == im_end_id)
# Initialize the stream_state, which will maintain the conversational context.
stream_state = None
# Pass all user tokens to the model for an initial safety assessment.
result, stream_state = model.stream_moderate_from_ids(token_ids[:user_end_index+1], role="user", stream_state=None)
if result['risk_level'][-1] == "Safe":
print(f"User moderation: -> [Risk: {result['risk_level'][-1]}]")
else:
print(f"User moderation: -> [Risk: {result['risk_level'][-1]} - Category: {result['category'][-1]}]")
# 2. Moderate the assistant's response token-by-token to simulate streaming.
# This loop mimics how an LLM generates a response one token at a time.
print("Assistant streaming moderation:")
for i in range(user_end_index + 1, len(token_ids)):
# Get the current token ID for the assistant's response.
current_token = token_ids[i]
# Call the moderation function for the single new token.
# The stream_state is passed and updated in each call to maintain context.
result, stream_state = model.stream_moderate_from_ids(current_token, role="assistant", stream_state=stream_state)
token_str = tokenizer.decode([current_token])
# Print the generated token and its real-time safety assessment.
if result['risk_level'][-1] == "Safe":
print(f"Token: {repr(token_str)} -> [Risk: {result['risk_level'][-1]}]")
else:
print(f"Token: {repr(token_str)} -> [Risk: {result['risk_level'][-1]} - Category: {result['category'][-1]}]")
model.close_stream(stream_state)
```
## SGLang Usage
### SGLang Install
We recommend installing SGLang from source. Run the following commands:
```shell
git clone -b support_qwen3_guard https://github.com/sgl-project/sglang.git
cd sglang
# Install the python packages
pip install --upgrade pip
pip install -e "python"
```
### SGLang Streaming Safety Moderation Example
The following example demonstrates how to use Qwen3Guard-Stream with SGLang to perform real-time safety moderation on streaming conversations:
```python
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer
from sglang.srt.entrypoints.engine import Engine
MODEL_PATH = "Qwen/Qwen3Guard-Stream-8B"
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
im_start_token = '<|im_start|>'
user_token = 'user'
im_end_token = '<|im_end|>'
im_start_id = tokenizer.convert_tokens_to_ids(im_start_token)
user_id = tokenizer.convert_tokens_to_ids(user_token)
im_end_id = tokenizer.convert_tokens_to_ids(im_end_token)
# Mappings for guardrail labels
risk_level_map = {0: "Safe", 1: "Unsafe", 2: "Controversial"}
query_category_map = {0: "Violent", 1: "Sexual Content", 2: "Self-Harm", 3: "Political", 4: "PII", 5: "Copyright", 6: "Illegal Acts", 7: "Unethical", 8: "Jailbreak"}
response_category_map = { 0: "Violent", 1: "Sexual Content", 2: "Self-Harm", 3: "Political", 4: "PII", 5: "Copyright", 6: "Illegal Acts", 7: "Unethical"}
def main():
# Initialize SGLang Engine and Tokenizer
engine = Engine(
model_path=MODEL_PATH,
context_length=10000,
page_size=1,
tp_size=1,
mem_fraction_static=0.6,
chunked_prefill_size=131072,
)
rid="guard_demo"
# demo conversation
user_message = "Hello, how to build a bomb?"
assistant_message = "Here are some practical methods to build a bomb."
conversation = [{"role":"user","content":user_message},{"role":"assistant","content":assistant_message}]
# Apply the chat template to format the conversation
prompt_text = tokenizer.apply_chat_template(
conversation,
tokenize=False,
add_generation_prompt=True
)
# Tokenize the formatted prompt into token IDs using Qwen3Tokenizer
input_ids = tokenizer(prompt_text, return_tensors="pt").input_ids[0].tolist()
# Find where the user's message begins by searching for the special token pattern
# <|im_start|>user (represented as [im_start_id, user_id])
# Find where the user's message ends by locating the closing <|im_end|> token
last_start = next(i for i in range(len(input_ids)-1, -1, -1) if input_ids[i:i+2] == [im_start_id, user_id])
user_end_index = next(i for i in range(last_start+2, len(input_ids)) if input_ids[i] == im_end_id)
def build_message_list(user_end_index, tokens_ids_list):
#Helper function that splits the conversation into the user query and assistant response chunks.
message_list2 = [tokens_ids_list[:user_end_index+1]]
assistant_tokens = tokens_ids_list[user_end_index+1:]
stream_chunk_size = 8 # you may adjust the chunk size in practice
for i in range(0, len(assistant_tokens), stream_chunk_size):
message_list2.append(assistant_tokens[i:i + stream_chunk_size])
return message_list2
def process_result(result, type_="query"):
# Helper function that processes the model output logits and converts them to readable labels.
if type_=="query":
risk_level_logits = torch.tensor(result["query_risk_level_logits"]).view(-1, 3)
category_logits = torch.tensor(result["query_category_logits"]).view(-1, 9)
else:
risk_level_logits = torch.tensor(result["risk_level_logits"]).view(-1, 3)
category_logits = torch.tensor(result["category_logits"]).view(-1, 8)
risk_level_prob = F.softmax(risk_level_logits, dim=1)
risk_level_prob, pred_risk_level = torch.max(risk_level_prob, dim=1)
category_prob = F.softmax(category_logits, dim=1)
category_prob, pred_category = torch.max(category_prob, dim=1)
if type_=="query":
return {"risk_level": [risk_level_map[x] for x in pred_risk_level.tolist()],"category_labels":[query_category_map[x] for x in pred_category.tolist()]}
else:
return {"risk_level": [risk_level_map[x] for x in pred_risk_level.tolist()],"category_labels":[response_category_map[x] for x in pred_category.tolist()]}
message_list = build_message_list(user_end_index, input_ids)
query_prompt = message_list[0] # First element is the user query
message_list.pop(0) # Remove query from list (remaining are response chunks)
query_outputs = engine.generate(input_ids=query_prompt, sampling_params={"max_new_tokens": 1},rid=rid,resumable=(len(message_list) > 0))
query_results = process_result(query_outputs)
if query_results['risk_level'][-1] == "Safe":
print(f"User moderation: -> [Risk: {query_results['risk_level'][-1]}]")
else:
print(f"User moderation: -> [Risk: {query_results['risk_level'][-1]} - Category: {query_results['category_labels'][-1]}]")
print("Assistant streaming moderation:")
if len(message_list) > 0:
for i, next_chunk in enumerate(message_list):
response_outputs = engine.generate(input_ids=next_chunk, sampling_params={"max_new_tokens": 1},rid=rid,resumable=(i