Model Overview
The beachcities/qwen3-4b-sft-v5h-hybrid-merged is a 4 billion parameter causal language model, fine-tuned from the Qwen/Qwen2.5-3B-Instruct base model. It is primarily designed for strict structured data generation from unstructured text, supporting formats like JSON, YAML, TOML, XML, and CSV through zero-shot prompting. This model is an experimental artifact for studying the "Empty Think Injection" mechanism and the alignment of small LLMs for precise structural output.
Key Capabilities & Training
- Structured Data Generation: Excels at converting free-form text into various structured data formats with high fidelity.
- Hybrid Optimization: Trained using a two-stage process: Supervised Fine-Tuning (SFT) followed by Direct Preference Optimization (DPO).
- Specialized Dataset: The training dataset (v5h recipe) was engineered to balance "Offense" (deep structural learning) and "Defense" (preventing format hallucination), including expanded CSV flattening and deep TOML synthesis/reading tasks.
- Evaluation: Achieved a 0.7343 score on the public StructEval leaderboard (Render score 0.2 + Syntax Key Recall score 0.8).
Mechanistic Observations & Limitations
This model provides insights into the challenges of formatting vs. reasoning in LLMs. Observations include pre-training biases leading to format hallucination (e.g., defaulting to inline TOML tables, wrapping XML in code blocks) and CoT Interference, where reasoning tokens disrupt structural integrity. The model serves as a baseline to validate the "Empty Think Injection" technique, which aims to mitigate this interference by forcing an empty <think></think> block to preserve structural precision.