Nomadv13/M-CLU-v1: Qwen2.5 Base Model
Nomadv13/M-CLU-v1 is a 0.5 billion parameter base causal language model from the Qwen2.5 series, developed by the Qwen Team. It builds upon the Qwen2 architecture, incorporating improvements across several key areas. This model is not recommended for direct conversational use but serves as a robust foundation for post-training applications like SFT, RLHF, or continued pretraining.
Key Capabilities & Improvements (vs. Qwen2)
- Enhanced Knowledge & Reasoning: Significantly improved capabilities in coding and mathematics, leveraging specialized expert models.
- Instruction Following: Substantial advancements in adhering to instructions and generating structured outputs, particularly JSON.
- Long Text Generation: Improved ability to generate texts exceeding 8,000 tokens.
- Structured Data Understanding: Better comprehension of structured data formats, such as tables.
- System Prompt Resilience: More robust handling of diverse system prompts, beneficial for role-play and chatbot condition-setting.
- Multilingual Support: Supports over 29 languages, including major global languages like Chinese, English, French, Spanish, German, and Japanese.
- Context Length: Features a full 32,768 token context window.
Architecture & Technical Specifications
This model utilizes a transformer architecture with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings. It comprises 24 layers and 14 attention heads (with 2 for KV in Grouped-query Attention). The model has 0.49 billion total parameters, with 0.36 billion non-embedding parameters.
Intended Use
This base model is primarily intended for developers and researchers who wish to perform further fine-tuning or adaptation for specific downstream tasks. It provides a strong pre-trained foundation for building specialized language models.