Overview
Mojo7/Katkut-3B is a merged language model developed by Mojo7, created using the SLERP (Spherical Linear Interpolation) merge method. This model combines the characteristics of two distinct base models: Mojo7/Katkut-3B and Qwen/Qwen2.5-3B-Instruct.
Merge Details
The merge process involved combining layers from both models across a range of [0, 28]. The base_model for the merge was Qwen/Qwen2.5-3B-Instruct, indicating its foundational role in the resulting architecture. Specific parameters for the t value were applied differently to self_attn and mlp layers, suggesting a fine-tuned approach to balancing the contributions of each base model.
Key Characteristics
- Hybrid Architecture: Benefits from the combined strengths of Mojo7/Katkut-3B and Qwen/Qwen2.5-3B-Instruct.
- SLERP Method: Utilizes a sophisticated merging technique to blend model weights effectively.
- Parameter Blending: Specific weighting applied to attention and MLP layers to optimize performance.
Potential Use Cases
This merged model is suitable for general-purpose language generation and understanding tasks where a balance of reasoning and specific stylistic elements from its constituent models is desired. It can be particularly useful for applications requiring a blend of logical coherence and unique linguistic patterns.