π§ͺ Model Behavior Incident Log
Collapse Symptoms After Annealing + SFT (Gemma-3-270M-TW)
This incident report documents multiple unexpected behaviors observed during the following pipeline:
- Gemma-3-270M (base) β CPT β Dialogue-Annealing β SFT
- Training performed with LLaMA-Factory on multiple Traditional Chinese datasets.
The purpose of this log is to document:
- what happened
- why the model collapsed
- representative failure outputs
- suspected root causes
- mitigation steps and future plans
π Summary of the Incident
After the stage CPT β Phase-3 Dialogue Annealing β SFT, the model displayed severe conversational degradation, including:
- repetitive loops
- meaningless vocalization (e.g.,
hihihihi...) - empty or nearly empty responses
- echoing the user query
- hallucinated list expansions
- loss of chat-template consistency
- collapse-like distribution drift
These issues did not appear in earlier checkpoints (e.g., CPT β SFT without annealing).
π Representative Failure Outputs
1. Character-level repetition
Prompt:
hi
Output:
hihihihihihihihihihihihihihihi...
2. Garbled symbols / noise
Prompt:
δ½ ε₯½
Output:
ηΈ½οΌοΌγοΌγοΌγοΌγ!βοΌγοΌβοΌβοΌγ!βγ!β
3. Semantic drift + self-looping
Prompt:
εηη©εε
¬εΌ
Output:
ζ―δ»ιΊΌοΌ
ζοΌδ»ει½ζ―ι£εγ
δ½ ει½ ζ―ι£εγ
ζ ιζ―ε¨ι£ε裑ι οΌ
ζδ»₯οΌζ²δΊοΌζ²ζοΌζ²δΊγγ
ζδΉδΈη₯ιγ
ζη₯ιγ
η₯ιγ
η₯ιδΊγ
4. Query echoing + hallucinated QA scaffolding
Prompt:
θ«η¨δΈε₯θ©±θ§£ιε°ηηΊδ»ιΊΌζθͺθ½γ
Output:
θ«η¨δΈε₯θ©±θ§£ιε°ηηΊδ»ιΊΌζθͺθ½γ
- ζ Ήζη©ηεΈοΌδ»ιΊΌζ―γζ°£ι«γοΌ
- δ»ιΊΌζ―γη£ε ΄γοΌ
5. Baseline sanity check (pre-annealing) behaves normally
Prompt:
δ½ ε₯½ιΏ
Earlier output:
δ½ ε₯½οΌεΎι«θθ½εδ½ δΈθ΅·δΊ€ζ΅οΌζζ―ε°η£ε€§θͺθ¨ζ¨‘ε Formosa-1...
π§ Suspected Root Causes
1. Annealing Dataset Format Drift
Use of raw:
user: ...
assistant: ...
instead of structured JSON messages.
2. Overwriting Instruction Alignment
Annealing is fragile for 270M models β possible distribution collapse.
3. SFT Unable to Recover
Loss decreased, but entropy collapse persisted.
π Actions Taken
- Rebuilt CPT dataset (2.4M samples, 70/30 mix)
- Adopted stable LR tail (
cosine_with_min_lr) - Will rebuild annealing data with strict chat format
- Full SFT pipeline will be repeated after clean CPT
π Next Steps
- Complete clean CPT
- Rebuild dialogue annealing
- Annealing from min_lr β 0
- Re-run SFT
- Compare ablation paths
- Add regression tests
π§ Closing Note
Small multilingual models are extremely sensitive to formatting + LR scheduling.
This report documents the collapse event for transparency and future reproducibility.
- Downloads last month
- 14