Name: AlexCuadron/DSR1-Qwen-32B-DSR1-Qwen-32B-131fad2c API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: AlexCuadron

Model Overview

AlexCuadron/DSR1-Qwen-32B is a 32 billion parameter language model derived from the deepseek-ai/DeepSeek-R1-Distill-Qwen-32B architecture. It has been specifically fine-tuned on the fc_rlm dataset, indicating a potential specialization for tasks aligned with the characteristics of this dataset. The model supports a substantial context length of 32768 tokens.

Training Details

The model underwent training with the following key hyperparameters:

Base Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
Dataset: fc_rlm
Learning Rate: 1e-05
Batch Size: 2 (train), 8 (eval) with 4 gradient accumulation steps, resulting in a total effective batch size of 64 for both training and evaluation.
Optimizer: ADAMW_TORCH with default betas and epsilon.
Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
Epochs: 4.0

Intended Uses & Limitations

Due to the specific fine-tuning on the fc_rlm dataset, this model is likely best suited for applications where the characteristics of this dataset are relevant. Further information regarding specific intended uses and known limitations is not detailed in the provided documentation.

Overview

Model Overview

Training Details

Intended Uses & Limitations

Full Model Card (README)