Overview
STILL-seed2 is a 32.8 billion parameter language model, fine-tuned from the deepseek-ai/DeepSeek-R1-Distill-Qwen-32B base model. It was specifically trained on the 'still' dataset, indicating a specialization towards the characteristics or domain of this particular dataset. The model supports a substantial context length of 32768 tokens, allowing it to process and generate longer sequences of text.
Training Details
The fine-tuning process involved 17 epochs with a learning rate of 1e-05 and a total training batch size of 96, utilizing 8 GPUs. The optimizer used was AdamW with cosine learning rate scheduling and a warmup ratio of 0.1. This training configuration suggests an effort to thoroughly adapt the base model to the 'still' dataset's nuances.
Intended Use
While specific intended uses and limitations are not detailed in the provided information, its fine-tuning on a specialized dataset implies it is best suited for tasks aligned with the 'still' dataset's content. Developers should consider its large parameter count and context window for applications requiring deep understanding or generation over extensive text.