Roystar/evolai-qwen2.5-1.5b
Roystar/evolai-qwen2.5-1.5b is a 1.54 billion parameter base causal language model from the Qwen2.5 series, developed by the Qwen Team. It features a transformer architecture with a 32,768 token context length. This model offers significantly improved knowledge, coding, and mathematics capabilities, along with enhanced instruction following and long text generation. It is designed for further post-training like SFT or RLHF, rather than direct conversational use.
Loading preview...
Qwen2.5-1.5B Overview
Roystar/evolai-qwen2.5-1.5b is a 1.54 billion parameter base language model from the latest Qwen2.5 series, developed by the Qwen Team. This model builds upon its predecessor, Qwen2, with substantial enhancements across several key areas. It is a causal language model utilizing a transformer architecture with RoPE, SwiGLU, and RMSNorm, and supports a full context length of 32,768 tokens.
Key Capabilities
- Enhanced Knowledge & Reasoning: Significantly improved capabilities in coding and mathematics, benefiting from specialized expert models.
- Instruction Following: Demonstrates better adherence to instructions and resilience to diverse system prompts, aiding in role-play and chatbot condition-setting.
- Long Text & Structured Output Generation: Improved performance in generating long texts (up to 8K tokens) and understanding/generating structured data, including JSON.
- Multilingual Support: Offers robust support for over 29 languages, including major global languages like Chinese, English, French, Spanish, German, and Japanese.
- Long-Context Support: Capable of processing contexts up to 128K tokens.
Good For
- Foundation for Fine-tuning: Ideal as a base model for further post-training, such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), or continued pretraining.
- Developing Specialized Models: Suitable for creating custom models focused on coding, mathematical reasoning, or structured data processing.
- Multilingual Applications: A strong candidate for applications requiring broad language support.
It is important to note that this base model is not recommended for direct conversational use without further fine-tuning. Detailed evaluation results and performance benchmarks are available in the official Qwen2.5 blog.