opencsg/csg-wukong-1B-sft-bf16

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.1BQuant:BF16Ctx Length:2kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The csg-wukong-1B-sft-bf16 is a 1.1 billion parameter small language model developed by OpenCSG, fine-tuned from the csg-wukong-1B base model. This model is optimized for general language tasks and has demonstrated competitive performance among 1.5B pretrained small language models. It was trained using 16 H800 GPUs over 43 days, leveraging Deepspeed and PyTorch.

Loading preview...

OpenCSG csg-wukong-1B-sft-bf16 Overview

The csg-wukong-1B-sft-bf16 is a 1.1 billion parameter small language model (SLM) developed by OpenCSG. It is a fine-tuned version of the csg-wukong-1B base model, designed to offer a compact yet capable solution for various language processing tasks. OpenCSG's vision emphasizes democratizing generative large models and empowering industries with their own AI capabilities.

Key Characteristics & Performance

  • Model Size: 1.1 billion parameters, making it suitable for resource-constrained environments or applications requiring faster inference.
  • Base Model: Fine-tuned from the pre-trained csg-wukong-1B.
  • Training Details: The model underwent 43 days of training on 16 H800 GPUs, utilizing Deepspeed for orchestration and PyTorch for neural network implementation, with BP16 enabled via Apex.
  • Leaderboard Ranking: The csg-wukong-1B base model achieved a notable 8th position among approximately 1.5B pretrained small language models on the open_llm_leaderboard, indicating strong performance relative to its size class.

Intended Use Cases

This model is well-suited for applications where a balance between performance and computational efficiency is crucial. Its competitive ranking suggests it can be a strong candidate for:

  • General text generation and understanding tasks.
  • Deployment in edge devices or scenarios with limited hardware resources.
  • As a foundation for further domain-specific fine-tuning.