πŸ§ͺ Model Behavior Incident Log

Collapse Symptoms After Annealing + SFT (Gemma-3-270M-TW)

This incident report documents multiple unexpected behaviors observed during the following pipeline:

  • Gemma-3-270M (base) β†’ CPT β†’ Dialogue-Annealing β†’ SFT
  • Training performed with LLaMA-Factory on multiple Traditional Chinese datasets.

The purpose of this log is to document:

  • what happened
  • why the model collapsed
  • representative failure outputs
  • suspected root causes
  • mitigation steps and future plans

πŸ“‰ Summary of the Incident

After the stage CPT β†’ Phase-3 Dialogue Annealing β†’ SFT, the model displayed severe conversational degradation, including:

  • repetitive loops
  • meaningless vocalization (e.g., hihihihi...)
  • empty or nearly empty responses
  • echoing the user query
  • hallucinated list expansions
  • loss of chat-template consistency
  • collapse-like distribution drift

These issues did not appear in earlier checkpoints (e.g., CPT β†’ SFT without annealing).


πŸ” Representative Failure Outputs

1. Character-level repetition

Prompt:

hi

Output:

hihihihihihihihihihihihihihihi...

2. Garbled symbols / noise

Prompt:

δ½ ε₯½

Output:

總!!」!」!」!」!”!」!”!”!」!”」!”

3. Semantic drift + self-looping

Prompt:

εœ“ηš„η©εˆ†ε…¬εΌ

Output:

ζ˜―δ»€ιΊΌοΌŸ
ζœ‰οΌŒδ»–ε€‘ιƒ½ζ˜―ι‚£ε€‹γ€‚
你們都 ζ˜―ι‚£ε€‹γ€‚
ζˆ‘ ι‚„ζ˜―εœ¨ι‚£ε€‹θ£‘ι ­οΌŒ
所δ»₯οΌŒζ²’δΊ†οΌŒζ²’ζœ‰οΌŒζ²’δΊ†γ€‚γ€‚
ζˆ‘δΉŸδΈηŸ₯道。
ζˆ‘ηŸ₯道。
ηŸ₯道。
ηŸ₯道了。

4. Query echoing + hallucinated QA scaffolding

Prompt:

請用一ε₯θ©±θ§£ι‡‹εœ°ηƒη‚Ίδ»€ιΊΌζœƒθ‡ͺ轉。

Output:

請用一ε₯θ©±θ§£ι‡‹εœ°ηƒη‚Ίδ»€ιΊΌζœƒθ‡ͺ轉。
- ζ Ήζ“šη‰©η†ε­ΈοΌŒδ»€ιΊΌζ˜―γ€Œζ°£ι«”γ€οΌŸ
- δ»€ιΊΌζ˜―γ€Œη£ε ΄γ€οΌŸ

5. Baseline sanity check (pre-annealing) behaves normally

Prompt:

δ½ ε₯½ι˜Ώ

Earlier output:

δ½ ε₯½οΌεΎˆι«˜θˆˆθƒ½ε’Œδ½ δΈ€θ΅·δΊ€ζ΅οΌŒζˆ‘ζ˜―台灣倧θͺžθ¨€ζ¨‘εž‹ Formosa-1...

🧭 Suspected Root Causes

1. Annealing Dataset Format Drift

Use of raw:

user: ...
assistant: ...

instead of structured JSON messages.

2. Overwriting Instruction Alignment

Annealing is fragile for 270M models β†’ possible distribution collapse.

3. SFT Unable to Recover

Loss decreased, but entropy collapse persisted.


πŸ›  Actions Taken

  • Rebuilt CPT dataset (2.4M samples, 70/30 mix)
  • Adopted stable LR tail (cosine_with_min_lr)
  • Will rebuild annealing data with strict chat format
  • Full SFT pipeline will be repeated after clean CPT

πŸ”„ Next Steps

  1. Complete clean CPT
  2. Rebuild dialogue annealing
  3. Annealing from min_lr β†’ 0
  4. Re-run SFT
  5. Compare ablation paths
  6. Add regression tests

🧭 Closing Note

Small multilingual models are extremely sensitive to formatting + LR scheduling.
This report documents the collapse event for transparency and future reproducibility.

Downloads last month
14
Safetensors
Model size
0.4B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for modelcollapse/gemma-3-270m-it-cpt-annealing-sft-251126

Finetuned
(5)
this model

Collection including modelcollapse/gemma-3-270m-it-cpt-annealing-sft-251126