Overview
This model, laion/glm-4_6-all-puzzles-32ep-131k, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has undergone specific fine-tuning on the penfever/glm-4.6-all-puzzles-32ep-131k dataset, indicating a specialization in tasks related to puzzles and logical problem-solving.
Training Details
The model was trained for 7 epochs using a learning rate of 4e-05 and an AdamW optimizer. Key training hyperparameters included a train_batch_size of 1, gradient_accumulation_steps of 2, resulting in a total_train_batch_size of 16 across 8 GPUs. A cosine learning rate scheduler with a 0.1 warmup ratio was employed.
Key Characteristics
- Base Model: Qwen/Qwen3-8B
- Parameter Count: 8 billion
- Context Length: 32768 tokens
- Specialization: Fine-tuned on a puzzle-oriented dataset, suggesting enhanced performance in reasoning and problem-solving tasks.
Intended Use
While specific intended uses and limitations are not detailed in the original model card, its fine-tuning on a puzzle dataset implies suitability for applications requiring logical deduction, pattern recognition, and problem-solving capabilities. Developers should evaluate its performance on specific puzzle-related benchmarks relevant to their use case.