Model Overview
The moogician/DSR1-Qwen-32B-scg-fixed is a 32 billion parameter language model, derived from the deepseek-ai/DeepSeek-R1-Distill-Qwen-32B architecture. This model has been specifically fine-tuned on the cwepy10 dataset, indicating a specialization towards the characteristics and tasks inherent in that data. It supports a substantial context length of 32768 tokens, allowing for processing and generating longer sequences of text.
Key Characteristics
- Base Model: Fine-tuned from
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B. - Parameter Count: 32 billion parameters.
- Context Length: 32768 tokens.
- Training Data: Fine-tuned on the
cwepy10 dataset. - Training Hyperparameters: Utilized a learning rate of 1e-05, a total batch size of 96, and a cosine learning rate scheduler over 6 epochs.
Potential Use Cases
Given its fine-tuning on the cwepy10 dataset, this model is likely best suited for applications that align with the nature and content of that specific dataset. Developers should evaluate its performance on tasks similar to those present in cwepy10 to determine its suitability for their particular use case.