laion/glm46-swesmith-maxeps-131k

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 15, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The laion/glm46-swesmith-maxeps-131k model is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. It was trained on the DCAgent2/glm46-neulab-mind2web-sandboxes-maxeps-131k dataset, suggesting a specialization in agentic tasks or interactions within sandbox environments. This model is likely optimized for specific applications requiring nuanced understanding or generation based on its fine-tuning data.

Loading preview...

Model Overview

laion/glm46-swesmith-maxeps-131k is an 8 billion parameter language model, fine-tuned from the robust Qwen/Qwen3-8B architecture. This model has undergone specialized training on the DCAgent2/glm46-neulab-mind2web-sandboxes-maxeps-131k dataset, indicating a potential focus on agent-based interactions, web environments, or sandbox simulations.

Training Details

The model was trained for 7 epochs using a learning rate of 4e-05, with a total effective batch size of 16 (train_batch_size: 1, gradient_accumulation_steps: 2, num_devices: 8). The optimizer used was ADAMW_TORCH_FUSED with betas=(0.9, 0.98) and epsilon=1e-08, employing a cosine learning rate scheduler with a 0.1 warmup ratio. The training utilized Transformers 4.57.3, Pytorch 2.9.0+cu128, Datasets 4.4.1, and Tokenizers 0.22.1.

Potential Use Cases

Given its fine-tuning dataset, this model is likely best suited for applications involving:

  • Agentic task execution: Interacting with environments or performing actions based on instructions.
  • Web-based interactions: Understanding or generating content related to web interfaces or sandboxed web environments.
  • Specialized data processing: Tasks that align with the characteristics of the mind2web-sandboxes dataset.