opencsg/csg-wukong-1B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.1BQuant:BF16Ctx Length:2kPublished:Apr 11, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The csg-wukong-1B is a 1.1 billion-parameter small language model (SLM) developed by OpenCSG. Pretrained on 1 trillion tokens, this model is designed for efficient language tasks. It has achieved a notable 8th rank among approximately 1.5B pretrained small language models on the open_llm_leaderboard, indicating strong performance within its size class. This model is suitable for applications requiring a compact yet capable language model.

Loading preview...

Model Overview

The csg-wukong-1B is a 1.1 billion-parameter Small Language Model (SLM) developed by OpenCSG. OpenCSG's vision is to democratize generative large models, making them accessible for every industry, company, and individual through open-source principles.

Key Capabilities & Performance

  • Compact and Efficient: With 1.1 billion parameters, it offers a balance between size and performance, making it suitable for resource-constrained environments.
  • Extensive Pretraining: The model was pretrained on a substantial dataset of 1 trillion tokens, contributing to its language understanding and generation capabilities.
  • Competitive Ranking: It has demonstrated strong performance on the open_llm_leaderboard, ranking 8th among pretrained SLMs in the ~1.5B parameter class.

Training Details

The csg-wukong-1B was trained over 43 days using 16 H800 GPUs. The training leveraged Deepspeed for orchestration, PyTorch as the deep learning framework, and Apex for BP16 precision, highlighting a robust and optimized training pipeline.

Use Cases

This model is well-suited for applications where a smaller footprint is critical, but effective language processing is still required. Its competitive performance among SLMs suggests its utility in various downstream tasks.