Name: moogician/DSR1-Qwen-32B-131fad2c API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: moogician

Overview

moogician/DSR1-Qwen-32B-131fad2c is a 32 billion parameter language model derived from the deepseek-ai/DeepSeek-R1-Distill-Qwen-32B base model. It has been fine-tuned on the fc_rlm dataset, indicating a specialized training focus that likely enhances its performance on tasks related to that specific data distribution. The model supports a context length of 32768 tokens, allowing for the processing of long and complex sequences.

Key Training Details

Base Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
Fine-tuning Dataset: fc_rlm
Learning Rate: 1e-05
Batch Size: 2 (train), 8 (eval) with 4 gradient accumulation steps, resulting in a total effective batch size of 64.
Optimizer: AdamW with standard betas and epsilon.
Scheduler: Cosine learning rate scheduler with 0.1 warmup ratio.
Epochs: 4.0

Potential Use Cases

Given its fine-tuning on the fc_rlm dataset, this model is likely optimized for:

Tasks requiring nuanced understanding or generation based on reinforcement learning from human feedback (RLHF) data.
Applications where the specific characteristics of the fc_rlm dataset are relevant for improved performance.
Scenarios benefiting from a 32B parameter model with a large 32K context window for detailed analysis or generation.

Overview

Overview

Key Training Details

Potential Use Cases

Full Model Card (README)