moogician/DSR1-Qwen-32B-scg

TEXT GENERATIONConcurrency Cost:2Model Size:32BQuant:FP8Ctx Length:32kLicense:otherArchitecture:Transformer Cold

The moogician/DSR1-Qwen-32B-scg is a 32 billion parameter causal language model, fine-tuned from DeepSeek-R1-Distill-Qwen-32B. This model is specifically adapted using the cwepy10 dataset, offering specialized performance for tasks aligned with its training data. With a context length of 32768 tokens, it is designed for applications requiring deep contextual understanding and generation based on its fine-tuning. Its primary strength lies in its specialized adaptation for particular domain-specific tasks.

Loading preview...

Model Overview

moogician/DSR1-Qwen-32B-scg is a 32 billion parameter language model, derived from the DeepSeek-R1-Distill-Qwen-32B architecture. This model has undergone specific fine-tuning on the cwepy10 dataset, aiming to enhance its performance and applicability within the domain represented by this dataset. It supports a substantial context length of 32768 tokens, enabling it to process and generate content with extensive contextual awareness.

Key Characteristics

  • Base Model: Fine-tuned from DeepSeek-R1-Distill-Qwen-32B.
  • Parameter Count: 32 billion parameters.
  • Context Window: 32768 tokens, suitable for tasks requiring long-range dependencies.
  • Specialized Training: Adapted using the cwepy10 dataset, suggesting optimized performance for tasks related to this data.

Training Details

The model was trained with a learning rate of 1e-05 over 6.0 epochs, utilizing a cosine learning rate scheduler with a warmup ratio of 0.1. The training involved a total batch size of 8 across 4 multi-GPU devices, employing the ADAMW_TORCH optimizer. These parameters indicate a focused and robust training regimen designed to leverage the specific fine-tuning dataset effectively.

Potential Use Cases

Given its fine-tuning on a specific dataset, this model is likely best suited for applications that align with the characteristics and content of the cwepy10 dataset. Developers should consider its specialized nature for tasks where domain-specific knowledge or generation is critical.