Model Overview
DCAgent/c1_kimi_k2.5_fixed is an 8 billion parameter language model, fine-tuned from the base model Qwen/Qwen3-8B. This model was developed by DCAgent and utilizes a substantial 32,768 token context window, enabling it to process and generate longer sequences of text.
Key Characteristics
- Base Model: Fine-tuned from Qwen/Qwen3-8B.
- Parameter Count: 8 billion parameters.
- Context Length: Supports a 32,768 token context window.
- Training Data: Fine-tuned on the
/e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--c1_kimi_k2.5_fixed/snapshots/5807137b49d0d1d27e7b100da3e8d4156ddb94e3_thinking_preprocessed dataset, indicating a potential specialization in tasks related to internal thought processes or reasoning.
Training Details
The model was trained with a learning rate of 4e-05, using a distributed setup across 16 devices with a total training batch size of 16. The optimizer used was ADAMW_TORCH_FUSED with cosine learning rate scheduler and a warmup ratio of 0.1 over 7 epochs. This configuration suggests a robust training process aimed at optimizing performance on its specific fine-tuning dataset.
Potential Use Cases
Given its fine-tuning on a "thinking" related dataset and large context window, this model could be particularly effective for:
- Complex Reasoning Tasks: Analyzing and generating text that involves logical steps or internal monologues.
- Long-form Content Generation: Creating detailed narratives, reports, or conversational turns that require maintaining context over extended periods.
- Specialized Conversational AI: Developing agents that can simulate or understand complex thought processes.