opencsg/csg-wukong-1B
The csg-wukong-1B is a 1.1 billion-parameter small language model (SLM) developed by OpenCSG. Pretrained on 1 trillion tokens, this model is designed for efficient language tasks. It has achieved a notable 8th rank among approximately 1.5B pretrained small language models on the open_llm_leaderboard, indicating strong performance within its size class. This model is suitable for applications requiring a compact yet capable language model.
Loading preview...
Model Overview
The csg-wukong-1B is a 1.1 billion-parameter Small Language Model (SLM) developed by OpenCSG. OpenCSG's vision is to democratize generative large models, making them accessible for every industry, company, and individual through open-source principles.
Key Capabilities & Performance
- Compact and Efficient: With 1.1 billion parameters, it offers a balance between size and performance, making it suitable for resource-constrained environments.
- Extensive Pretraining: The model was pretrained on a substantial dataset of 1 trillion tokens, contributing to its language understanding and generation capabilities.
- Competitive Ranking: It has demonstrated strong performance on the open_llm_leaderboard, ranking 8th among pretrained SLMs in the ~1.5B parameter class.
Training Details
The csg-wukong-1B was trained over 43 days using 16 H800 GPUs. The training leveraged Deepspeed for orchestration, PyTorch as the deep learning framework, and Apex for BP16 precision, highlighting a robust and optimized training pipeline.
Use Cases
This model is well-suited for applications where a smaller footprint is critical, but effective language processing is still required. Its competitive performance among SLMs suggests its utility in various downstream tasks.