Qwen3-Coder-30B-A3B-n8n-Workflow-Generator
Fine-tuned Qwen3-Coder-30B-A3B-Instruct model specialized for generating n8n workflow JSONs from natural language descriptions.
Model Description
This model is a QLoRA fine-tuned version of Qwen/Qwen3-Coder-30B-A3B-Instruct on the n8nbuilder-n8n-workflows-dataset, containing +2.5K n8n workflow templates.
Key Features:
- MoE Architecture: Mixture of Experts (A3B) architecture for efficient inference
- 30B Parameters: Large-scale model with 30 billion parameters
- Faster Inference: Optimized MoE design enables faster inference times compared to dense models
- Apple Silicon Optimized: MLX Q4 quantized version available for Mac M4 Pro and other Apple Silicon devices
Training Details:
- Base Model: Qwen/Qwen3-Coder-30B-A3B-Instruct
- Method: QLoRA (4-bit quantization)
- LoRA Rank: 8
- LoRA Alpha: 16
- LoRA Dropout: 0.05
- Training Steps: 451 (3 epochs)
- Sequence Length: 8192 tokens
- Learning Rate: 1e-4
- Total Sequences: 2,308
- Total Tokens: 28,426,032
Usage
Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "mbakgun/qwen3-coder-30b-a3b-n8n-workflow-generator"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
system_prompt = "You are an expert n8n workflow generation assistant. Your goal is to create valid, efficient, and functional n8n workflow configurations."
user_input = "Create a workflow that monitors a RSS feed and sends new items to Discord."
prompt = f"{system_prompt}\n\n{user_input}"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=4096,
temperature=0.7,
do_sample=True
)
workflow_json = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(workflow_json)
MLX (Apple Silicon)
# Download MLX Q4 model
mlx_lm.generate \
--model mbakgun/qwen3-coder-30b-a3b-n8n-workflow-generator/mlx-q4 \
--prompt "You are an expert n8n workflow generation assistant. Your goal is to create valid, efficient, and functional n8n workflow configurations.\n\nCreate a workflow that sends Slack notifications when GitHub issues are created." \
--max-tokens 4096 \
--temp 0.7
Using LoRA Adapter
If you want to load the base model and apply the LoRA adapter separately:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base_model_name = "Qwen/Qwen3-Coder-30B-A3B-Instruct"
adapter_name = "mbakgun/qwen3-coder-30b-a3b-n8n-workflow-generator"
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
model = PeftModel.from_pretrained(base_model, adapter_name)
model = model.merge_and_unload() # Optional: merge adapter into base model
Training Data
This model was fine-tuned on the n8nbuilder-n8n-workflows-dataset, which contains:
- +2,308 workflow templates (after filtering sequences >8192 tokens)
- Format: Alpaca (instruction/input/output)
- Source: n8n.io public template gallery
- n8nbuilder.dev - Create n8n Workflows in Seconds with AI
Architecture: Mixture of Experts (MoE)
This model uses the A3B (Activate 3 Billion) MoE architecture, which means:
- Total Parameters: 30B parameters
- Active Parameters: Only ~3B parameters are activated per token during inference
- Efficiency: Significantly faster inference compared to dense 30B models
- Expert Routing: Intelligent routing mechanism selects the most relevant experts for each token
The MoE architecture enables this model to maintain the quality of a 30B parameter model while achieving inference speeds closer to a 3B parameter model.
Performance
- Training: Fine-tuned on Fireworks.ai infrastructure
- Inference: Optimized for fast inference with MoE architecture
- MLX Performance: Excellent performance on Apple Silicon (M4 Pro) with Q4 quantization
- Memory Efficient: QLoRA training enables efficient fine-tuning with minimal memory overhead
Model Files
This repository contains:
- Adapter: LoRA adapter weights (
adapter_model.safetensors,adapter_config.json) - Merged Model: Full fine-tuned model (base + adapter merged)
- MLX Q4: Quantized model optimized for Apple Silicon (
mlx-q4/directory)
Limitations
- Generated workflows may require manual validation
- Long workflows (>8192 tokens) may be truncated
- Model trained on public templates only
- MoE routing may occasionally select suboptimal experts
Citation
@model{qwen3_coder_n8n_2025,
title={Qwen3-Coder-30B-A3B-n8n-Workflow-Generator},
author={mbakgun},
year={2025},
base_model={Qwen/Qwen3-Coder-30B-A3B-Instruct},
dataset={mbakgun/n8nbuilder-n8n-workflows-dataset},
url={https://huggingface.co/mbakgun/qwen3-coder-30b-a3b-n8n-workflow-generator}
}
Acknowledgments
License
Apache 2.0
- Downloads last month
- 52
Model tree for mbakgun/qwen3-coder-30b-a3b-n8n-workflow-generator
Base model
Qwen/Qwen3-Coder-30B-A3B-Instruct