TheTsar1209/qwen-carpmuscle-r-v0.3
TheTsar1209/qwen-carpmuscle-r-v0.3 is a 14.8 billion parameter language model based on the Qwen2.5 architecture, developed by TheTsar1209. This model was created using Rombodawg's Shared Continuous Finetuning method, merging a continuously pretrained Qwen2.5-14B model with Qwen2.5-14B-Instruct and Qwen2.5-14B using the TIES merging technique. It supports a context length of 131072 tokens and is designed for general text generation tasks across multiple languages including Chinese, English, French, Spanish, and more.
Loading preview...
Model Overview
TheTsar1209/qwen-carpmuscle-r-v0.3 is a 14.8 billion parameter language model developed by TheTsar1209. It is built upon the Qwen2.5-14B and Qwen2.5-14B-Instruct base models, utilizing a unique merging strategy. The model was created using Rombodawg's Shared Continuous Finetuning method, which involved continuous pretraining on the ChatML format with a 24k context using Unsloth's optimized Qwen2.5-14B-bnb-4bit, followed by a TIES merge with the instruct and base Qwen2.5-14B models.
Key Characteristics
- Architecture: Based on the Qwen2.5 family, leveraging both base and instruct variants.
- Merging Technique: Employs the TIES (Trimmed, Iterative, and Efficient Merging of Experts) method via
mergekitto combine different model checkpoints. - Multilingual Support: Capable of handling text generation in numerous languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic.
- Training Optimization: The underlying
qwen-carpmuscle-v0.3component was trained 2x faster using Unsloth and Huggingface's TRL library.
Performance Highlights
Evaluated on the Open LLM Leaderboard, the model shows:
- IFEval (0-Shot): 44.55 strict accuracy
- BBH (3-Shot): 46.38 normalized accuracy
- MMLU-PRO (5-shot): 45.59 accuracy
Use Cases
This model is suitable for general text generation tasks where a blend of capabilities from base and instruction-tuned Qwen2.5 models is desired, particularly in multilingual contexts. Its merging methodology suggests an attempt to combine the strengths of different Qwen2.5 variants.