jiamingfeatherless/Qwen2.5-0.5B-Histogram
jiamingfeatherless/Qwen2.5-0.5B-Histogram is a 0.49 billion parameter causal language model from the Qwen2.5 series, developed by Qwen Team. This base model features a transformer architecture with RoPE, SwiGLU, and RMSNorm, supporting a context length of 32,768 tokens. It offers significantly improved knowledge, coding, and mathematics capabilities compared to its predecessor, Qwen2, and is designed for further fine-tuning for specific applications.
Loading preview...
Qwen2.5-0.5B Overview
This repository hosts the Qwen2.5-0.5B model, a 0.49 billion parameter base causal language model from the latest Qwen2.5 series by Qwen Team. It builds upon the Qwen2 architecture, incorporating improvements in knowledge, coding, and mathematics through specialized expert models. The model supports a substantial context length of 32,768 tokens and is designed for pretraining, not direct conversational use.
Key Capabilities & Features
- Enhanced Knowledge & Skills: Significantly improved capabilities in coding and mathematics.
- Long-Context Support: Handles up to 32,768 tokens, with the broader Qwen2.5 series supporting up to 128K tokens.
- Multilingual: Supports over 29 languages, including Chinese, English, French, Spanish, and more.
- Structured Data & Output: Improved understanding of structured data (e.g., tables) and generation of structured outputs like JSON.
- Robust Instruction Following: More resilient to diverse system prompts, enhancing role-play and condition-setting.
Intended Use
This 0.5B base model is primarily intended for post-training applications such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), or continued pretraining. It is not recommended for direct conversational use without further fine-tuning. Developers can leverage its enhanced base capabilities to build specialized models for various tasks.