HoangTran223/MCW_KD_Teacher_Qwen1.5-1.8B
HoangTran223/MCW_KD_Teacher_Qwen1.5-1.8B is a 1.8 billion parameter language model based on the Qwen1.5 architecture, a beta version of Qwen2. Developed by Qwen, this model supports a stable 32K context length and features an improved tokenizer for multilingual and code support. It is primarily intended as a base model for further post-training, such as SFT or RLHF, rather than direct text generation.
Loading preview...
Qwen1.5-1.8B Overview
HoangTran223/MCW_KD_Teacher_Qwen1.5-1.8B is a 1.8 billion parameter model from the Qwen1.5 series, which serves as the beta release for Qwen2. This model is built on a Transformer architecture incorporating SwiGLU activation, attention QKV bias, and group query attention. It also features an enhanced tokenizer designed for improved multilingual and code adaptability.
Key Capabilities
- Stable 32K Context Length: Supports a consistent 32,768 token context window across all model sizes in the Qwen1.5 series.
- Multilingual Support: Both base and chat models offer robust multilingual capabilities.
- Improved Tokenizer: Features an advanced tokenizer optimized for various natural languages and programming codes.
- No
trust_remote_codeRequirement: Simplifies integration and usage within the Hugging Face Transformers ecosystem.
Good for
- Further Fine-tuning: Ideal as a foundational base model for subsequent training stages like Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), or continued pretraining.
- Research and Development: Suitable for researchers exploring new language model architectures and training methodologies.
- Custom Application Development: Provides a strong starting point for developers to build specialized language applications by applying domain-specific post-training.