AlexCuadron/DSR1-Qwen-32B-DSR1-Qwen-32B-131fad2c

TEXT GENERATIONConcurrency Cost:2Model Size:32BQuant:FP8Ctx Length:32kLicense:otherArchitecture:Transformer Cold

AlexCuadron/DSR1-Qwen-32B is a 32 billion parameter causal language model, fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-32B on the fc_rlm dataset. This model leverages a 32768 token context length and was trained with specific hyperparameters including a learning rate of 1e-05 and 4 epochs. Its primary characteristic is its fine-tuning on the fc_rlm dataset, suggesting potential specialization for tasks related to that data.

Loading preview...

Model Overview

AlexCuadron/DSR1-Qwen-32B is a 32 billion parameter language model derived from the deepseek-ai/DeepSeek-R1-Distill-Qwen-32B architecture. It has been specifically fine-tuned on the fc_rlm dataset, indicating a potential specialization for tasks aligned with the characteristics of this dataset. The model supports a substantial context length of 32768 tokens.

Training Details

The model underwent training with the following key hyperparameters:

  • Base Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
  • Dataset: fc_rlm
  • Learning Rate: 1e-05
  • Batch Size: 2 (train), 8 (eval) with 4 gradient accumulation steps, resulting in a total effective batch size of 64 for both training and evaluation.
  • Optimizer: ADAMW_TORCH with default betas and epsilon.
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
  • Epochs: 4.0

Intended Uses & Limitations

Due to the specific fine-tuning on the fc_rlm dataset, this model is likely best suited for applications where the characteristics of this dataset are relevant. Further information regarding specific intended uses and known limitations is not detailed in the provided documentation.