Name: moogician/DSR1-Qwen-32B-scg API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: moogician

Model Overview

moogician/DSR1-Qwen-32B-scg is a 32 billion parameter language model, derived from the DeepSeek-R1-Distill-Qwen-32B architecture. This model has undergone specific fine-tuning on the cwepy10 dataset, aiming to enhance its performance and applicability within the domain represented by this dataset. It supports a substantial context length of 32768 tokens, enabling it to process and generate content with extensive contextual awareness.

Key Characteristics

Base Model: Fine-tuned from DeepSeek-R1-Distill-Qwen-32B.
Parameter Count: 32 billion parameters.
Context Window: 32768 tokens, suitable for tasks requiring long-range dependencies.
Specialized Training: Adapted using the cwepy10 dataset, suggesting optimized performance for tasks related to this data.

Training Details

The model was trained with a learning rate of 1e-05 over 6.0 epochs, utilizing a cosine learning rate scheduler with a warmup ratio of 0.1. The training involved a total batch size of 8 across 4 multi-GPU devices, employing the ADAMW_TORCH optimizer. These parameters indicate a focused and robust training regimen designed to leverage the specific fine-tuning dataset effectively.

Potential Use Cases

Given its fine-tuning on a specific dataset, this model is likely best suited for applications that align with the characteristics and content of the cwepy10 dataset. Developers should consider its specialized nature for tasks where domain-specific knowledge or generation is critical.

Overview

Model Overview

Key Characteristics

Training Details

Potential Use Cases

Full Model Card (README)