prithivMLmods/Qwen2.5-32B-DeepSeek-R1-Instruct

Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kArchitecture:Transformer0.0K Warm

prithivMLmods/Qwen2.5-32B-DeepSeek-R1-Instruct is a 32.8 billion parameter merged language model, built upon the Qwen2.5-32B-Instruct base using the TIES merge method. It integrates QwQ-32B-Preview and DeepSeek-R1-Distill-Qwen-32B, configured with normalization and int8 masking for optimized performance. This model is designed to leverage the combined strengths of its constituent models, offering enhanced capabilities for instruction-following tasks.

Loading preview...

Overview

prithivMLmods/Qwen2.5-32B-DeepSeek-R1-Instruct is a 32.8 billion parameter language model created by prithivMLmods. It is a product of a sophisticated merge operation using the TIES merge method via MergeKit.

Merge Details

This model uses Qwen/Qwen2.5-32B-Instruct as its foundational base. It strategically combines two distinct models:

  • Qwen/QwQ-32B-Preview
  • deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

Both merged models contribute with equal weight and density, aiming to synthesize their respective strengths. The merge configuration includes specific optimizations such as normalization, int8 masking, and bfloat16 precision, which are crucial for maintaining and enhancing performance post-merge.

Key Characteristics

  • Architecture: Based on the Qwen2.5-32B-Instruct family.
  • Parameter Count: 32.8 billion parameters.
  • Context Length: Supports a context length of 131,072 tokens.
  • Merge Method: Utilizes the TIES (Trimmed, Iterative, and Selective) merge method for combining model weights.
  • Precision: Optimized for bfloat16 operations.

Intended Use

This model is suitable for developers and researchers looking for a powerful instruction-following model that integrates the capabilities of multiple high-performing base models. Its merged nature suggests a balanced performance across various tasks, benefiting from the diverse training data and architectural nuances of its components.