laion/glm-4_6-all-puzzles-32ep-131k

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kLicense:apache-2.0Architecture:Transformer Open Weights Cold

The laion/glm-4_6-all-puzzles-32ep-131k model is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. It was specifically trained on the penfever/glm-4.6-all-puzzles-32ep-131k dataset over 7 epochs. This model is optimized for puzzle-solving tasks, leveraging its fine-tuning on a specialized dataset to enhance its reasoning capabilities in this domain.

Loading preview...

Overview

This model, laion/glm-4_6-all-puzzles-32ep-131k, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has undergone specific fine-tuning on the penfever/glm-4.6-all-puzzles-32ep-131k dataset, indicating a specialization in tasks related to puzzles and logical problem-solving.

Training Details

The model was trained for 7 epochs using a learning rate of 4e-05 and an AdamW optimizer. Key training hyperparameters included a train_batch_size of 1, gradient_accumulation_steps of 2, resulting in a total_train_batch_size of 16 across 8 GPUs. A cosine learning rate scheduler with a 0.1 warmup ratio was employed.

Key Characteristics

  • Base Model: Qwen/Qwen3-8B
  • Parameter Count: 8 billion
  • Context Length: 32768 tokens
  • Specialization: Fine-tuned on a puzzle-oriented dataset, suggesting enhanced performance in reasoning and problem-solving tasks.

Intended Use

While specific intended uses and limitations are not detailed in the original model card, its fine-tuning on a puzzle dataset implies suitability for applications requiring logical deduction, pattern recognition, and problem-solving capabilities. Developers should evaluate its performance on specific puzzle-related benchmarks relevant to their use case.