ManniX-ITA/Qwen3.5-4B-M1-Dare-Ties
ManniX-ITA/Qwen3.5-4B-M1-Dare-Ties is a 4.5 billion parameter language model based on the Qwen3.5-4B architecture, created by ManniX-ITA. This model is a vanilla DARE-TIES merge of Qwen3.5-4B with two distilled fine-tunes, serving as a baseline in a comparative study on coding benchmarks. It features a 32768 token context length and is part of an investigation into merge recipes and importance-signal weighting for improved coding performance.
Loading preview...
Overview
ManniX-ITA/Qwen3.5-4B-M1-Dare-Ties is a 4.5 billion parameter model derived from the Qwen3.5-4B base, developed by ManniX-ITA. It represents the M1 variant in a series of models exploring different merging techniques and importance-signal weighting for coding tasks. This specific model is a vanilla DARE-TIES merge, combining two distinct distillation fine-tunes: Jackrong/Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2 and Crownelius/Crow-4B-Opus-4.6-Distill-Heretic_Qwen3.5, with weights of 0.55 and 0.45 respectively.
Performance on Coding Benchmarks
This model was evaluated against its base and source models using llama-server and lm_eval on HumanEval and MBPP benchmarks. Key findings include:
- HumanEval pass@1: The M1 variant achieved 51.22%, which is lower than the Qwen3.5-4B base model's 60.37%. The study noted that no merge in the comparison surpassed the base model on HumanEval.
- MBPP pass@1: M1 scored 47.00%, showing a slight improvement over one source (Jackrong-v2 at 45.00%) but slightly below the other (Crow-4B at 48.20%). Other merge variants (M4-v2, M5) demonstrated better MBPP performance, indicating that merging can enhance MBPP capability, often at the expense of HumanEval scores.
Context and Purpose
Qwen3.5-4B-M1-Dare-Ties is primarily a research artifact, serving as a baseline for a broader study on the impact of merge recipes and importance-signal weighting on model performance, particularly for coding. Its 32768 token context length is consistent across the evaluated variants.