lemon07r/Qwen3-R1-SLERP-Q3T-8B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jun 19, 2025Architecture:Transformer0.0K Warm

lemon07r/Qwen3-R1-SLERP-Q3T-8B is an 8 billion parameter language model, a 50/50 SLERP merge of DeepSeek-R1-0528-Qwen3-8B and Qwen3-8B, both based on the Qwen3 architecture. This model utilizes the Qwen tokenizer, which testing indicates performs more efficiently and accurately compared to the DeepSeek tokenizer for this specific merge. It is optimized for general text generation and reasoning tasks, demonstrating improved performance over its parent models.

Loading preview...

Model Overview

lemon07r/Qwen3-R1-SLERP-Q3T-8B is an 8 billion parameter model created by lemon07r through a 50/50 SLERP (Spherical Linear Interpolation) merge of two Qwen3-8B based models: DeepSeek-R1-0528-Qwen3-8B and Qwen3-8B. The merge was motivated by the architectural similarities and shared base training of the parent models, aiming to combine their strengths.

Key Characteristics

  • Architecture: Based on the Qwen3-8B architecture, inheriting its foundational capabilities.
  • Merge Method: Utilizes the SLERP merge method, which the creator notes often yields superior results compared to other merging techniques.
  • Tokenizer: Employs the Qwen tokenizer, which was found to perform more effectively and use fewer tokens for equivalent output compared to the DeepSeek tokenizer in comparative testing.
  • Performance: Initial testing, including higher precision runs by none-user, indicates that this SLERP merge (Q3T variant) performs significantly better than its individual parent models.

Use Cases and Strengths

This model is particularly well-suited for general text generation and reasoning tasks where the combined strengths of its parent models, enhanced by the SLERP merge, can be leveraged. Its improved performance over individual components suggests it could serve as a strong base for further fine-tuning efforts. The model's development also highlights an experimental approach to comparing tokenizer effectiveness within merged models.