laion/qwen3base-GLM-4_7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k
The laion/qwen3base-GLM-4_7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k model is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B-Base. It was trained on a specific dataset related to GLM-4.7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k, suggesting a specialization in areas related to its training data. With a context length of 32768 tokens, it is designed for tasks requiring extensive contextual understanding.
Loading preview...
Model Overview
This model, laion/qwen3base-GLM-4_7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k, is an 8 billion parameter language model. It is a fine-tuned variant of the Qwen3-8B-Base architecture, developed by Qwen.
Key Characteristics
- Base Model: Fine-tuned from Qwen/Qwen3-8B-Base.
- Parameter Count: 8 billion parameters.
- Context Length: Supports a substantial context window of 32768 tokens.
- Training Data: Fine-tuned on a specialized dataset:
/data/cat/ws/befe330h-befe330h-otagent/huggingface/hub/datasets--DCAgent2--GLM-4.7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k/snapshots/e209a88db18950c3ce4e72a45a6088561d99d1bf_thinking_preprocessed.
Training Details
The model was trained with the following hyperparameters:
- Learning Rate: 4e-05
- Optimizer: ADAMW_TORCH_FUSED
- Epochs: 7.0
- Batch Size: A total training batch size of 16 (with gradient accumulation steps of 2).
Intended Use
Given its fine-tuning on a specific dataset, this model is likely intended for applications closely related to the nature of its training data, potentially involving specialized tasks or domains. Users should evaluate its performance on their specific use cases, especially those requiring deep contextual understanding due to its large context window.