Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:May 1, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1 is a 7.6 billion parameter instruction-tuned causal language model based on the Qwen architecture, created by Optitransfer. This model is a training-free cross-family weight merge of Qwen2.5-7B-Instruct with 8 donor models from 4 different architectural families. It significantly improves performance on reasoning and instruction-following tasks like GSM8K, ARC-Challenge, and IFEval, making it suitable for applications requiring enhanced logical and instructional capabilities.

Loading preview...

Overview

Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1 is a 7.6 billion parameter instruction-tuned model, developed by Optitransfer, that leverages a novel training-free cross-family weight merging technique. This model combines the base Qwen2.5-7B-Instruct with 8 donor models from diverse architectures (Llama, Phi, NeoX, OPT) to enhance specific capabilities without additional fine-tuning or distillation.

Key Capabilities & Differentiators

  • Enhanced Reasoning & Instruction Following: Achieves notable gains on benchmarks such as GSM8K (+3.3 pp), ARC-Challenge (+3.2 pp), and IFEval (+2.6 pp) over the unmerged anchor model.
  • Cross-Family Merging: Utilizes a unique pipeline to merge weights across different architectural families, a process conventionally considered impossible due to varying internal structures.
  • Deterministic & Reproducible: The merging process is fully deterministic, ensuring consistent results across runs.
  • Broad Compatibility: Compatible with standard HuggingFace inference stacks, including vLLM, llama.cpp (after GGUF conversion), and text-generation-inference.

Limitations

  • Code Generation Regression: Shows a regression of 6.10 pp on HumanEval, as the donor pool was intentionally reasoning-heavy and code-light.
  • Mild MMLU Regression: Experiences a slight decrease in MMLU performance (-0.86 pp), indicating a trade-off for concentrated instruction-following and reasoning abilities.

Intended Use

This model is ideal for research and evaluation of cross-family weight-merging techniques. It serves as a drop-in replacement for Qwen/Qwen2.5-7B-Instruct in workflows where improved reasoning and instruction-following are prioritized over code generation.