ManniX-ITA/Qwen3.5-4B-M4-ex-LRP
ManniX-ITA/Qwen3.5-4B-M4-ex-LRP is a 4.5 billion parameter language model based on the Qwen3.5-4B architecture, created by ManniX-ITA. This model is a merge of two fine-tuned Qwen3.5-4B variants, utilizing an Explainable LRP (Ex-LRP) method for merge weighting, which is a key differentiator. It is part of a study comparing various merging recipes, with a focus on its performance in coding tasks like MBPP.
Loading preview...
Overview
ManniX-ITA/Qwen3.5-4B-M4-ex-LRP is a 4.5 billion parameter model derived from the Qwen3.5-4B base, developed by ManniX-ITA. It represents a specific merge variant (M4) within a comparative study of model merging techniques. This model was created using an Explainable LRP (Ex-LRP) method, which leverages AttnLRP relevance scores to determine merge weighting, as detailed in a mergekit pull request.
Key Characteristics
- Base Model: Qwen/Qwen3.5-4B.
- Source Models: Merged from
Jackrong/Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2andCrownelius/Crow-4B-Opus-4.6-Distill-Heretic_Qwen3.5with specific weightings (0.55 / 0.45). - Merging Method: Employs the
ex-LRPrecipe frommergekitPR #682, using LRP (Layer-wise Relevance Propagation) as the importance signal. - Performance: While the base Qwen3.5-4B model shows stronger HumanEval performance, the M4-v2 variant of this merge (ManniX-ITA/Qwen3.5-4B-M4-v2-ex-LRP-turbo) achieved the highest MBPP pass@1 score (52.20%) among the tested merges, surpassing both source models.
- Context Length: Supports a context length of 32768 tokens.
Use Cases
This model is particularly relevant for researchers and developers interested in:
- Model Merging Research: Understanding the impact of LRP-driven merging techniques on model performance.
- Code Generation Tasks: The M4-v2 variant demonstrates improved performance on the MBPP benchmark, suggesting potential for code-related applications where MBPP is a relevant metric.
- Comparative Analysis: It serves as a specific data point in a broader study of different merging recipes (e.g., DARE-TIES, OMv2, Fisher, LRP) and their effects on the Qwen3.5-4B architecture.