laion/dev_set_part1_10k_glm_4_7_traces_jupiter

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Feb 23, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The laion/dev_set_part1_10k_glm_4_7_traces_jupiter model is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. It was trained on the /data/cat/ws/befe330h-befe330h-otagent/huggingface/hub/datasets--DCAgent--dev_set_part1_10k_glm_4.7_traces_jupiter/snapshots/f1871d1c1446b3b43cbfe2737d0df56cecf3f420_thinking_preprocessed dataset. This model is designed for tasks related to its specific fine-tuning data, offering a specialized application of the Qwen3-8B architecture.

Loading preview...

Overview

This model, laion/dev_set_part1_10k_glm_4_7_traces_jupiter, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has been specifically fine-tuned on a unique dataset located at /data/cat/ws/befe330h-befe330h-otagent/huggingface/hub/datasets--DCAgent--dev_set_part1_10k_glm_4.7_traces_jupiter/snapshots/f1871d1c1446b3b43cbfe2737d0df56cecf3f420_thinking_preprocessed.

Training Details

The fine-tuning process involved several key hyperparameters:

  • Learning Rate: 4e-05
  • Batch Size: 1 (train), 8 (eval)
  • Gradient Accumulation: 2 steps, leading to a total effective batch size of 16
  • Optimizer: ADAMW_TORCH_FUSED with specific beta and epsilon values
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio
  • Epochs: 7.0

Intended Use

While specific intended uses and limitations require further information, its fine-tuning on a specialized dataset suggests it is optimized for tasks related to the nature of that data. Developers should consider the origin and specific training data when evaluating its suitability for their applications.