laion/exp-psu-swesmith-1K_glm_4-7_traces_jupiter__Qwen3-8B
The laion/exp-psu-swesmith-1K_glm_4-7_traces_jupiter__Qwen3-8B model is an 8 billion parameter language model, fine-tuned from the Qwen3-8B architecture. It was trained on the /e/data1/datasets/playground/ot/hf_hub/datasets--DCAgent--exp-psu-swesmith-1K_glm_4.7_traces_jupiter/snapshots/24c8342833108c3a15a23b64f37b83ff7e65efa4_thinking_preprocessed dataset. This model is a specialized fine-tune, with its primary differentiation stemming from its specific training data and process, rather than broad general-purpose capabilities.
Loading preview...
Overview
This model, sft__exp-psu-swesmith-1K_glm_4-7_traces_jupiter__Qwen3-8B, is an 8 billion parameter language model derived from the Qwen3-8B base architecture. It has undergone a specific fine-tuning process using a unique dataset, /e/data1/datasets/playground/ot/hf_hub/datasets--DCAgent--exp-psu-swesmith-1K_glm_4.7_traces_jupiter/snapshots/24c8342833108c3a15a23b64f37b83ff7e65efa4_thinking_preprocessed.
Training Details
The fine-tuning was conducted over 7 epochs with a learning rate of 4e-05. Key hyperparameters included a train_batch_size of 1, gradient_accumulation_steps of 3, resulting in a total_train_batch_size of 96. The optimizer used was ADAMW_TORCH_FUSED with specific beta and epsilon values, and a cosine learning rate scheduler with a 0.1 warmup ratio. The training leveraged a multi-GPU setup across 32 devices.
Key Characteristics
- Base Model: Qwen3-8B
- Parameter Count: 8 billion
- Context Length: 32768 tokens
- Fine-tuning Dataset: A specialized dataset, indicating a potential focus on tasks related to the nature of this specific data.
Potential Use Cases
Given its fine-tuned nature on a particular dataset, this model is likely best suited for tasks that align closely with the data it was trained on. Developers should investigate the exp-psu-swesmith-1K_glm_4.7_traces_jupiter dataset to understand the model's specific strengths and intended applications.