OpenCSG csg-wukong-1B-sft-bf16 Overview
The csg-wukong-1B-sft-bf16 is a 1.1 billion parameter small language model (SLM) developed by OpenCSG. It is a fine-tuned version of the csg-wukong-1B base model, designed to offer a compact yet capable solution for various language processing tasks. OpenCSG's vision emphasizes democratizing generative large models and empowering industries with their own AI capabilities.
Key Characteristics & Performance
- Model Size: 1.1 billion parameters, making it suitable for resource-constrained environments or applications requiring faster inference.
- Base Model: Fine-tuned from the pre-trained csg-wukong-1B.
- Training Details: The model underwent 43 days of training on 16 H800 GPUs, utilizing Deepspeed for orchestration and PyTorch for neural network implementation, with BP16 enabled via Apex.
- Leaderboard Ranking: The csg-wukong-1B base model achieved a notable 8th position among approximately 1.5B pretrained small language models on the open_llm_leaderboard, indicating strong performance relative to its size class.
Intended Use Cases
This model is well-suited for applications where a balance between performance and computational efficiency is crucial. Its competitive ranking suggests it can be a strong candidate for:
- General text generation and understanding tasks.
- Deployment in edge devices or scenarios with limited hardware resources.
- As a foundation for further domain-specific fine-tuning.