laion/glm46-swesmith-maxeps-131k-fixthink

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Feb 16, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The laion/glm46-swesmith-maxeps-131k-fixthink model is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. It was specifically trained on the /data/cat/ws/befe330h-befe330h-otagent/huggingface/hub/datasets--penfever--glm46-swesmith-maxeps-131k/snapshots/4d4c2d4a9d21f73870ed31c7bc6028035b3b6ca7_thinking_preprocessed dataset. This model is designed for tasks benefiting from its specialized fine-tuning, offering a 32768 token context window.

Loading preview...

Overview

This model, glm46-swesmith-maxeps-131k-fixthink, is an 8 billion parameter language model developed by laion. It is a fine-tuned variant of the robust Qwen/Qwen3-8B architecture, specifically adapted through training on a unique dataset: /data/cat/ws/befe330h-befe330h-otagent/huggingface/hub/datasets--penfever--glm46-swesmith-maxeps-131k/snapshots/4d4c2d4a9d21f73870ed31c7bc6028035b3b6ca7_thinking_preprocessed. This specialized training aims to imbue the model with particular characteristics derived from its training data.

Training Details

The model underwent training with specific hyperparameters, including a learning rate of 4e-05, a total batch size of 16 (achieved with train_batch_size: 1 and gradient_accumulation_steps: 2 across 8 GPUs), and 7 epochs. The optimizer used was ADAMW_TORCH_FUSED with a cosine learning rate scheduler and a 0.1 warmup ratio. The training utilized Transformers 4.57.6, Pytorch 2.9.0+cu128, Datasets 4.4.1, and Tokenizers 0.22.2.

Potential Use Cases

Given its fine-tuning on a specific dataset, this model is likely best suited for applications that align with the nature and content of the /data/cat/ws/befe330h-befe330h-otagent/huggingface/hub/datasets--penfever--glm46-swesmith-maxeps-131k/snapshots/4d4c2d4a9d21f73870ed31c7bc6028035b3b6ca7_thinking_preprocessed dataset. Developers should evaluate its performance on tasks requiring the specific knowledge or generation style imparted by this training data.