moogician/STILL-seed2

TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Mar 23, 2025License:otherArchitecture:Transformer Cold

moogician/STILL-seed2 is a 32.8 billion parameter language model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-32B. It features a 32768 token context length and was trained with a focus on specific data from the 'still' dataset. This model is intended for tasks benefiting from its specialized fine-tuning, though specific capabilities require further information.

Loading preview...

Overview

STILL-seed2 is a 32.8 billion parameter language model, fine-tuned from the deepseek-ai/DeepSeek-R1-Distill-Qwen-32B base model. It was specifically trained on the 'still' dataset, indicating a specialization towards the characteristics or domain of this particular dataset. The model supports a substantial context length of 32768 tokens, allowing it to process and generate longer sequences of text.

Training Details

The fine-tuning process involved 17 epochs with a learning rate of 1e-05 and a total training batch size of 96, utilizing 8 GPUs. The optimizer used was AdamW with cosine learning rate scheduling and a warmup ratio of 0.1. This training configuration suggests an effort to thoroughly adapt the base model to the 'still' dataset's nuances.

Intended Use

While specific intended uses and limitations are not detailed in the provided information, its fine-tuning on a specialized dataset implies it is best suited for tasks aligned with the 'still' dataset's content. Developers should consider its large parameter count and context window for applications requiring deep understanding or generation over extensive text.