Overview
prithivMLmods/Qwen2.5-32B-DeepSeek-R1-Instruct is a 32.8 billion parameter language model created by prithivMLmods. It is a product of a sophisticated merge operation using the TIES merge method via MergeKit.
Merge Details
This model uses Qwen/Qwen2.5-32B-Instruct as its foundational base. It strategically combines two distinct models:
- Qwen/QwQ-32B-Preview
- deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
Both merged models contribute with equal weight and density, aiming to synthesize their respective strengths. The merge configuration includes specific optimizations such as normalization, int8 masking, and bfloat16 precision, which are crucial for maintaining and enhancing performance post-merge.
Key Characteristics
- Architecture: Based on the Qwen2.5-32B-Instruct family.
- Parameter Count: 32.8 billion parameters.
- Context Length: Supports a context length of 131,072 tokens.
- Merge Method: Utilizes the TIES (Trimmed, Iterative, and Selective) merge method for combining model weights.
- Precision: Optimized for
bfloat16 operations.
Intended Use
This model is suitable for developers and researchers looking for a powerful instruction-following model that integrates the capabilities of multiple high-performing base models. Its merged nature suggests a balanced performance across various tasks, benefiting from the diverse training data and architectural nuances of its components.