Name: moogician/DSR1-Qwen-32B-still API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: moogician

Overview

moogician/DSR1-Qwen-32B-still is a 32 billion parameter language model derived from the deepseek-ai/DeepSeek-R1-Distill-Qwen-32B architecture. Its primary distinction lies in its fine-tuning on the "still" dataset, indicating a potential specialization for tasks aligned with the nature of this specific data. The model supports a substantial context length of 32768 tokens, allowing for the processing of lengthy texts and complex queries.

Training Details

The model was trained with a learning rate of 1e-05 over 17 epochs, utilizing a multi-GPU setup with 8 devices. Key hyperparameters included a train_batch_size of 2, gradient_accumulation_steps of 6, resulting in a total_train_batch_size of 96. The optimizer used was ADAMW_TORCH with standard betas and epsilon, and a cosine learning rate scheduler with a 0.1 warmup ratio. The training environment included Transformers 4.49.0, Pytorch 2.5.1+cu124, Datasets 3.2.0, and Tokenizers 0.21.0.

Intended Use

While specific intended uses and limitations are not detailed in the provided README, its fine-tuning on a particular dataset suggests it may excel in tasks related to that data's domain. Users should evaluate its performance for their specific applications, especially those requiring a large context window.

Overview

Overview

Training Details

Intended Use

Full Model Card (README)