DCAgent/c1_kimi_k2.5

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 9, 2026License:otherArchitecture:Transformer Cold

DCAgent/c1_kimi_k2.5 is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. This model is specifically adapted using a thinking preprocessed dataset, suggesting an optimization for internal reasoning or complex task execution. It is designed for applications requiring advanced cognitive processing capabilities within a 32768-token context window.

Loading preview...

Overview

DCAgent/c1_kimi_k2.5 is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-8B architecture. This model has undergone specialized training on a "thinking preprocessed" dataset, indicating a focus on enhancing its internal reasoning processes and ability to handle complex, multi-step tasks. The fine-tuning process involved specific hyperparameters, including a learning rate of 4e-05 and 7 epochs of training, utilizing a multi-GPU setup.

Key Characteristics

  • Base Model: Qwen/Qwen3-8B
  • Parameter Count: 8 billion
  • Context Length: 32768 tokens
  • Fine-tuning Data: Utilizes a unique "thinking preprocessed" dataset, suggesting an emphasis on cognitive or reasoning-intensive tasks.

Training Details

The model was trained with a learning rate of 4e-05, a total batch size of 16 across 16 devices, and employed an AdamW optimizer with a cosine learning rate scheduler. The training spanned 7 epochs, leveraging Transformers 4.57.6 and PyTorch 2.9.1+cu130.

Potential Use Cases

Given its fine-tuning on a "thinking preprocessed" dataset, this model is likely optimized for:

  • Complex problem-solving
  • Reasoning and logical inference tasks
  • Applications requiring structured thought processes