Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1
Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1 is a 7.6 billion parameter instruction-tuned causal language model based on the Qwen architecture, created by Optitransfer. This model is a training-free cross-family weight merge of Qwen2.5-7B-Instruct with 8 donor models from 4 different architectural families. It significantly improves performance on reasoning and instruction-following tasks like GSM8K, ARC-Challenge, and IFEval, making it suitable for applications requiring enhanced logical and instructional capabilities.
Loading preview...
Overview
Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1 is a 7.6 billion parameter instruction-tuned model, developed by Optitransfer, that leverages a novel training-free cross-family weight merging technique. This model combines the base Qwen2.5-7B-Instruct with 8 donor models from diverse architectures (Llama, Phi, NeoX, OPT) to enhance specific capabilities without additional fine-tuning or distillation.
Key Capabilities & Differentiators
- Enhanced Reasoning & Instruction Following: Achieves notable gains on benchmarks such as GSM8K (+3.3 pp), ARC-Challenge (+3.2 pp), and IFEval (+2.6 pp) over the unmerged anchor model.
- Cross-Family Merging: Utilizes a unique pipeline to merge weights across different architectural families, a process conventionally considered impossible due to varying internal structures.
- Deterministic & Reproducible: The merging process is fully deterministic, ensuring consistent results across runs.
- Broad Compatibility: Compatible with standard HuggingFace inference stacks, including
vLLM,llama.cpp(after GGUF conversion), andtext-generation-inference.
Limitations
- Code Generation Regression: Shows a regression of 6.10 pp on HumanEval, as the donor pool was intentionally reasoning-heavy and code-light.
- Mild MMLU Regression: Experiences a slight decrease in MMLU performance (-0.86 pp), indicating a trade-off for concentrated instruction-following and reasoning abilities.
Intended Use
This model is ideal for research and evaluation of cross-family weight-merging techniques. It serves as a drop-in replacement for Qwen/Qwen2.5-7B-Instruct in workflows where improved reasoning and instruction-following are prioritized over code generation.