laion/exp-swd-r2egym-wo-docker_glm_4_7_traces

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jan 21, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The laion/exp-swd-r2egym-wo-docker_glm_4_7_traces model is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. It was trained on the DCAgent/exp-swd-r2egym-wo-docker_glm_4.7_traces dataset, suggesting a specialization in areas related to its training data. This model is designed for specific applications leveraging its fine-tuning on a targeted dataset, offering a context length of 32768 tokens.

Loading preview...

Overview

This model, exp-swd-r2egym-wo-docker_glm_4_7_traces, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has been specifically fine-tuned on the DCAgent/exp-swd-r2egym-wo-docker_glm_4.7_traces dataset, indicating a specialized focus on tasks or domains represented within that dataset. The model supports a substantial context length of 32768 tokens.

Key Capabilities

  • Specialized Fine-tuning: Optimized for performance on data similar to the DCAgent/exp-swd-r2egym-wo-docker_glm_4.7_traces dataset.
  • Large Context Window: Capable of processing inputs up to 32768 tokens, beneficial for tasks requiring extensive context.
  • Qwen3-8B Base: Inherits the foundational capabilities of the Qwen3-8B model.

Training Details

The model was trained using a learning rate of 4e-05, a total batch size of 16 (with gradient accumulation steps of 2), and the AdamW_Torch_Fused optimizer. A cosine learning rate scheduler with a 0.1 warmup ratio was employed over 7 epochs. The training utilized 8 GPUs, leveraging Transformers 4.57.3 and PyTorch 2.9.0+cu128.