TheTsar1209/qwen-carpmuscle-r-v0.3

TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Oct 23, 2024Architecture:Transformer0.0K Cold

TheTsar1209/qwen-carpmuscle-r-v0.3 is a 14.8 billion parameter language model based on the Qwen2.5 architecture, developed by TheTsar1209. This model was created using Rombodawg's Shared Continuous Finetuning method, merging a continuously pretrained Qwen2.5-14B model with Qwen2.5-14B-Instruct and Qwen2.5-14B using the TIES merging technique. It supports a context length of 131072 tokens and is designed for general text generation tasks across multiple languages including Chinese, English, French, Spanish, and more.

Loading preview...

Model Overview

TheTsar1209/qwen-carpmuscle-r-v0.3 is a 14.8 billion parameter language model developed by TheTsar1209. It is built upon the Qwen2.5-14B and Qwen2.5-14B-Instruct base models, utilizing a unique merging strategy. The model was created using Rombodawg's Shared Continuous Finetuning method, which involved continuous pretraining on the ChatML format with a 24k context using Unsloth's optimized Qwen2.5-14B-bnb-4bit, followed by a TIES merge with the instruct and base Qwen2.5-14B models.

Key Characteristics

  • Architecture: Based on the Qwen2.5 family, leveraging both base and instruct variants.
  • Merging Technique: Employs the TIES (Trimmed, Iterative, and Efficient Merging of Experts) method via mergekit to combine different model checkpoints.
  • Multilingual Support: Capable of handling text generation in numerous languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic.
  • Training Optimization: The underlying qwen-carpmuscle-v0.3 component was trained 2x faster using Unsloth and Huggingface's TRL library.

Performance Highlights

Evaluated on the Open LLM Leaderboard, the model shows:

  • IFEval (0-Shot): 44.55 strict accuracy
  • BBH (3-Shot): 46.38 normalized accuracy
  • MMLU-PRO (5-shot): 45.59 accuracy

Use Cases

This model is suitable for general text generation tasks where a blend of capabilities from base and instruction-tuned Qwen2.5 models is desired, particularly in multilingual contexts. Its merging methodology suggests an attempt to combine the strengths of different Qwen2.5 variants.