laion/GLM-4_7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k
The laion/GLM-4_7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k model is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. It was trained on the DCAgent2/GLM-4.7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k dataset, suggesting a specialization in agent-based or sandbox environments. This model is likely optimized for tasks requiring interaction within defined test or simulation frameworks, leveraging its 32768 token context length.
Loading preview...
Model Overview
This model, laion/GLM-4_7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has been specifically fine-tuned using the DCAgent2/GLM-4.7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k dataset.
Key Characteristics
- Base Model: Fine-tuned from Qwen/Qwen3-8B.
- Parameter Count: 8 billion parameters.
- Context Length: Supports a substantial context window of 32768 tokens.
- Training Data: Utilizes a specialized dataset, indicating potential optimization for specific interactive or agent-based tasks.
Training Details
The fine-tuning process involved a learning rate of 4e-05, a total train batch size of 16 (with 8 multi-GPU devices and 2 gradient accumulation steps), and a cosine learning rate scheduler with a 0.1 warmup ratio over 7 epochs. The optimizer used was ADAMW_TORCH_FUSED.
Potential Use Cases
Given its fine-tuning dataset, this model is likely suited for applications involving:
- Agent-based systems: Interacting within defined environments or simulations.
- Sandbox testing: Generating or interpreting actions within constrained, verifiable systems.
- Automated verification: Tasks requiring interaction with oracle-verified outputs over specific timeframes (e.g., 120s).
Further details on specific intended uses and limitations are not provided in the current model description.