laion/exp-psu-swesmith-1K_glm_4-7_traces_jupiter__Qwen3-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 11, 2026License:otherArchitecture:Transformer Cold

The laion/exp-psu-swesmith-1K_glm_4-7_traces_jupiter__Qwen3-8B model is an 8 billion parameter language model, fine-tuned from the Qwen3-8B architecture. It was trained on the /e/data1/datasets/playground/ot/hf_hub/datasets--DCAgent--exp-psu-swesmith-1K_glm_4.7_traces_jupiter/snapshots/24c8342833108c3a15a23b64f37b83ff7e65efa4_thinking_preprocessed dataset. This model is a specialized fine-tune, with its primary differentiation stemming from its specific training data and process, rather than broad general-purpose capabilities.

Loading preview...

Overview

This model, sft__exp-psu-swesmith-1K_glm_4-7_traces_jupiter__Qwen3-8B, is an 8 billion parameter language model derived from the Qwen3-8B base architecture. It has undergone a specific fine-tuning process using a unique dataset, /e/data1/datasets/playground/ot/hf_hub/datasets--DCAgent--exp-psu-swesmith-1K_glm_4.7_traces_jupiter/snapshots/24c8342833108c3a15a23b64f37b83ff7e65efa4_thinking_preprocessed.

Training Details

The fine-tuning was conducted over 7 epochs with a learning rate of 4e-05. Key hyperparameters included a train_batch_size of 1, gradient_accumulation_steps of 3, resulting in a total_train_batch_size of 96. The optimizer used was ADAMW_TORCH_FUSED with specific beta and epsilon values, and a cosine learning rate scheduler with a 0.1 warmup ratio. The training leveraged a multi-GPU setup across 32 devices.

Key Characteristics

  • Base Model: Qwen3-8B
  • Parameter Count: 8 billion
  • Context Length: 32768 tokens
  • Fine-tuning Dataset: A specialized dataset, indicating a potential focus on tasks related to the nature of this specific data.

Potential Use Cases

Given its fine-tuned nature on a particular dataset, this model is likely best suited for tasks that align closely with the data it was trained on. Developers should investigate the exp-psu-swesmith-1K_glm_4.7_traces_jupiter dataset to understand the model's specific strengths and intended applications.