Overview

This model, sft__exp-psu-swesmith-1K_glm_4-7_traces_jupiter__Qwen3-8B, is an 8 billion parameter language model derived from the Qwen3-8B base architecture. It has undergone a specific fine-tuning process using a unique dataset, /e/data1/datasets/playground/ot/hf_hub/datasets--DCAgent--exp-psu-swesmith-1K_glm_4.7_traces_jupiter/snapshots/24c8342833108c3a15a23b64f37b83ff7e65efa4_thinking_preprocessed.

Training Details

The fine-tuning was conducted over 7 epochs with a learning rate of 4e-05. Key hyperparameters included a train_batch_size of 1, gradient_accumulation_steps of 3, resulting in a total_train_batch_size of 96. The optimizer used was ADAMW_TORCH_FUSED with specific beta and epsilon values, and a cosine learning rate scheduler with a 0.1 warmup ratio. The training leveraged a multi-GPU setup across 32 devices.

Key Characteristics

Base Model: Qwen3-8B
Parameter Count: 8 billion
Context Length: 32768 tokens
Fine-tuning Dataset: A specialized dataset, indicating a potential focus on tasks related to the nature of this specific data.

Potential Use Cases

Given its fine-tuned nature on a particular dataset, this model is likely best suited for tasks that align closely with the data it was trained on. Developers should investigate the exp-psu-swesmith-1K_glm_4.7_traces_jupiter dataset to understand the model's specific strengths and intended applications.

Overview

Overview

Training Details

Key Characteristics

Potential Use Cases

Full Model Card (README)