Model Overview

This model, staqc-sandboxes-traces-terminus-2_Qwen3-32B, is a fine-tuned variant of the Qwen3-32B base model developed by Qwen. It has been specifically adapted through further training on the mlfoundations-dev/staqc-sandboxes-traces-terminus-2 dataset.

Key Characteristics

Base Model: Qwen/Qwen3-32B
Parameter Count: 32 billion parameters
Context Length: 32768 tokens
Fine-tuning Dataset: mlfoundations-dev/staqc-sandboxes-traces-terminus-2

Training Details

The fine-tuning process utilized the following key hyperparameters:

Learning Rate: 4e-05
Optimizer: ADAMW_TORCH_FUSED
Batch Size: A total training batch size of 64 (with 1 per device and 4 gradient accumulation steps) across 16 GPUs.
Epochs: 5.0 epochs
Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.

Intended Use Cases

Given its fine-tuning on a specific dataset, this model is best suited for applications that align with the data distribution and tasks present in the mlfoundations-dev/staqc-sandboxes-traces-terminus-2 dataset. Developers should consider its specialized training for tasks requiring nuanced understanding or generation within that domain.

Overview

Model Overview

Key Characteristics

Training Details

Intended Use Cases

Full Model Card (README)