moogician/DSR1-Qwen-32B-scg-fixed
The moogician/DSR1-Qwen-32B-scg-fixed model is a 32 billion parameter language model, fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-32B. It was trained on the cwepy10 dataset with a context length of 32768 tokens. This model is a specialized adaptation of the DeepSeek-R1-Distill-Qwen architecture, focusing on specific tasks related to its fine-tuning dataset.
Loading preview...
Model Overview
The moogician/DSR1-Qwen-32B-scg-fixed is a 32 billion parameter language model, derived from the deepseek-ai/DeepSeek-R1-Distill-Qwen-32B architecture. This model has been specifically fine-tuned on the cwepy10 dataset, indicating a specialization towards the characteristics and tasks inherent in that data. It supports a substantial context length of 32768 tokens, allowing for processing and generating longer sequences of text.
Key Characteristics
- Base Model: Fine-tuned from
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B. - Parameter Count: 32 billion parameters.
- Context Length: 32768 tokens.
- Training Data: Fine-tuned on the
cwepy10dataset. - Training Hyperparameters: Utilized a learning rate of 1e-05, a total batch size of 96, and a cosine learning rate scheduler over 6 epochs.
Potential Use Cases
Given its fine-tuning on the cwepy10 dataset, this model is likely best suited for applications that align with the nature and content of that specific dataset. Developers should evaluate its performance on tasks similar to those present in cwepy10 to determine its suitability for their particular use case.