netcat420/DEFUNCT-EXPERIMENT2_1

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jan 23, 2025Architecture:Transformer0.0K Cold

netcat420/DEFUNCT-EXPERIMENT2_1 is a 7.6 billion parameter language model created by netcat420, merged using the SLERP method from netcat420/DeepSeek-R1-MFANN-TIES-unretrained-7b and netcat420/Qwen2.5-DeepSeek-R1-MFANN-7b. This model leverages a 32768-token context length, combining the characteristics of its constituent models. It is primarily designed for general language tasks, benefiting from the blended architectures.

Loading preview...

Overview

DEFUNCT-EXPERIMENT2_1 is a 7.6 billion parameter language model developed by netcat420, created through a merge of two pre-trained models: netcat420/DeepSeek-R1-MFANN-TIES-unretrained-7b and netcat420/Qwen2.5-DeepSeek-R1-MFANN-7b. This model utilizes a substantial 32768-token context window, making it suitable for processing longer inputs and generating extended outputs.

Merge Details

This model was constructed using the SLERP (Spherical Linear Interpolation) merge method, a technique often employed to combine the strengths of different models while maintaining performance. The merge process specifically targeted layers 0 through 28 of both source models, with a detailed configuration that adjusted parameters for self-attention and MLP components, indicating a fine-tuned approach to blending their capabilities.

Key Characteristics

  • Merged Architecture: Combines elements from DeepSeek-R1-MFANN-TIES and Qwen2.5-DeepSeek-R1-MFANN models.
  • Parameter Count: 7.6 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Supports a 32768-token context, enabling handling of extensive textual data.
  • Merge Method: Utilizes the SLERP method for a balanced integration of source model features.

Potential Use Cases

Given its merged nature and substantial context window, DEFUNCT-EXPERIMENT2_1 is likely suitable for a variety of general language understanding and generation tasks where a blend of capabilities from its constituent models would be beneficial. This could include tasks requiring detailed comprehension of long texts or generating coherent, extended responses.