laion/exp-swd-r2egym-wo-docker_glm_4_7_traces
The laion/exp-swd-r2egym-wo-docker_glm_4_7_traces model is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. It was trained on the DCAgent/exp-swd-r2egym-wo-docker_glm_4.7_traces dataset, suggesting a specialization in areas related to its training data. This model is designed for specific applications leveraging its fine-tuning on a targeted dataset, offering a context length of 32768 tokens.
Loading preview...
Overview
This model, exp-swd-r2egym-wo-docker_glm_4_7_traces, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has been specifically fine-tuned on the DCAgent/exp-swd-r2egym-wo-docker_glm_4.7_traces dataset, indicating a specialized focus on tasks or domains represented within that dataset. The model supports a substantial context length of 32768 tokens.
Key Capabilities
- Specialized Fine-tuning: Optimized for performance on data similar to the
DCAgent/exp-swd-r2egym-wo-docker_glm_4.7_tracesdataset. - Large Context Window: Capable of processing inputs up to 32768 tokens, beneficial for tasks requiring extensive context.
- Qwen3-8B Base: Inherits the foundational capabilities of the Qwen3-8B model.
Training Details
The model was trained using a learning rate of 4e-05, a total batch size of 16 (with gradient accumulation steps of 2), and the AdamW_Torch_Fused optimizer. A cosine learning rate scheduler with a 0.1 warmup ratio was employed over 7 epochs. The training utilized 8 GPUs, leveraging Transformers 4.57.3 and PyTorch 2.9.0+cu128.