Qwen3-Coder-30B-A3B-n8n-Workflow-Generator

Fine-tuned Qwen3-Coder-30B-A3B-Instruct model specialized for generating n8n workflow JSONs from natural language descriptions.

Model Description

This model is a QLoRA fine-tuned version of Qwen/Qwen3-Coder-30B-A3B-Instruct on the n8nbuilder-n8n-workflows-dataset, containing +2.5K n8n workflow templates.

Key Features:

MoE Architecture: Mixture of Experts (A3B) architecture for efficient inference
30B Parameters: Large-scale model with 30 billion parameters
Faster Inference: Optimized MoE design enables faster inference times compared to dense models
Apple Silicon Optimized: MLX Q4 quantized version available for Mac M4 Pro and other Apple Silicon devices

Training Details:

Base Model: Qwen/Qwen3-Coder-30B-A3B-Instruct
Method: QLoRA (4-bit quantization)
LoRA Rank: 8
LoRA Alpha: 16
LoRA Dropout: 0.05
Training Steps: 451 (3 epochs)
Sequence Length: 8192 tokens
Learning Rate: 1e-4
Total Sequences: 2,308
Total Tokens: 28,426,032

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "mbakgun/qwen3-coder-30b-a3b-n8n-workflow-generator"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

system_prompt = "You are an expert n8n workflow generation assistant. Your goal is to create valid, efficient, and functional n8n workflow configurations."

user_input = "Create a workflow that monitors a RSS feed and sends new items to Discord."

prompt = f"{system_prompt}\n\n{user_input}"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=4096,
    temperature=0.7,
    do_sample=True
)

workflow_json = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(workflow_json)

MLX (Apple Silicon)

# Download MLX Q4 model
mlx_lm.generate \
  --model mbakgun/qwen3-coder-30b-a3b-n8n-workflow-generator/mlx-q4 \
  --prompt "You are an expert n8n workflow generation assistant. Your goal is to create valid, efficient, and functional n8n workflow configurations.\n\nCreate a workflow that sends Slack notifications when GitHub issues are created." \
  --max-tokens 4096 \
  --temp 0.7

Using LoRA Adapter

If you want to load the base model and apply the LoRA adapter separately:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_model_name = "Qwen/Qwen3-Coder-30B-A3B-Instruct"
adapter_name = "mbakgun/qwen3-coder-30b-a3b-n8n-workflow-generator"

tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

model = PeftModel.from_pretrained(base_model, adapter_name)
model = model.merge_and_unload()  # Optional: merge adapter into base model

Training Data

This model was fine-tuned on the n8nbuilder-n8n-workflows-dataset, which contains:

+2,308 workflow templates (after filtering sequences >8192 tokens)
Format: Alpaca (instruction/input/output)
Source: n8n.io public template gallery
n8nbuilder.dev - Create n8n Workflows in Seconds with AI

Architecture: Mixture of Experts (MoE)

This model uses the A3B (Activate 3 Billion) MoE architecture, which means:

Total Parameters: 30B parameters
Active Parameters: Only ~3B parameters are activated per token during inference
Efficiency: Significantly faster inference compared to dense 30B models
Expert Routing: Intelligent routing mechanism selects the most relevant experts for each token

The MoE architecture enables this model to maintain the quality of a 30B parameter model while achieving inference speeds closer to a 3B parameter model.

Performance

Training: Fine-tuned on Fireworks.ai infrastructure
Inference: Optimized for fast inference with MoE architecture
MLX Performance: Excellent performance on Apple Silicon (M4 Pro) with Q4 quantization
Memory Efficient: QLoRA training enables efficient fine-tuning with minimal memory overhead

Model Files

This repository contains:

Adapter: LoRA adapter weights (adapter_model.safetensors, adapter_config.json)
Merged Model: Full fine-tuned model (base + adapter merged)
MLX Q4: Quantized model optimized for Apple Silicon (mlx-q4/ directory)

Limitations

Generated workflows may require manual validation
Long workflows (>8192 tokens) may be truncated
Model trained on public templates only
MoE routing may occasionally select suboptimal experts

Citation

@model{qwen3_coder_n8n_2025,
  title={Qwen3-Coder-30B-A3B-n8n-Workflow-Generator},
  author={mbakgun},
  year={2025},
  base_model={Qwen/Qwen3-Coder-30B-A3B-Instruct},
  dataset={mbakgun/n8nbuilder-n8n-workflows-dataset},
  url={https://huggingface.co/mbakgun/qwen3-coder-30b-a3b-n8n-workflow-generator}
}

Acknowledgments

Qwen Team for the base model
n8n for the workflow automation platform
n8n-mcp for template indexing

License

Apache 2.0

Downloads last month: 52

Safetensors

Model size

31B params

Tensor type

BF16

Model tree for mbakgun/qwen3-coder-30b-a3b-n8n-workflow-generator

Base model

Qwen/Qwen3-Coder-30B-A3B-Instruct

Finetuned

(32)

this model

mbakgun
/

qwen3-coder-30b-a3b-n8n-workflow-generator