laion/exp-syh-r2egym-swesmith-mixed_glm_4_7_traces_locetash

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Feb 9, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The laion/exp-syh-r2egym-swesmith-mixed_glm_4_7_traces_locetash model is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. It was trained on the DCAgent/exp-syh-r2egym-swesmith-mixed_glm_4.7_traces_locetash dataset with a context length of 32768 tokens. This model is a specialized adaptation of the Qwen3-8B architecture, intended for tasks aligned with its specific fine-tuning dataset.

Loading preview...

Overview

This model, exp-syh-r2egym-swesmith-mixed_glm_4_7_traces_locetash, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has been fine-tuned on the DCAgent/exp-syh-r2egym-swesmith-mixed_glm_4.7_traces_locetash dataset, suggesting a specialization for tasks related to the characteristics of this particular dataset. The model supports a substantial context length of 32768 tokens, allowing it to process and generate longer sequences of text.

Key Capabilities

  • Specialized Fine-tuning: Adapted from Qwen3-8B, indicating potential for enhanced performance on tasks similar to its fine-tuning data.
  • Extended Context Window: Capable of handling inputs up to 32768 tokens, beneficial for applications requiring extensive contextual understanding.

Good For

  • Use cases that align with the specific data distribution and patterns found in the DCAgent/exp-syh-r2egym-swesmith-mixed_glm_4.7_traces_locetash dataset.
  • Applications requiring processing of long documents or conversations due to its large context window.

Training Details

The model was trained using a learning rate of 4e-05, a batch size of 1 per device across 8 GPUs (total effective batch size of 16 with gradient accumulation), and a cosine learning rate scheduler with a 0.1 warmup ratio over 7 epochs. The optimizer used was AdamW_Torch_Fused.