March07/Qwen2-5-Coder-32B-sft-kimi-800

TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Mar 27, 2026License:otherArchitecture:Transformer Cold

March07/Qwen2-5-Coder-32B-sft-kimi-800 is a 32.8 billion parameter language model, fine-tuned from Qwen/Qwen2.5-Coder-32B-Instruct. This model specializes in code-related tasks, having undergone supervised fine-tuning on the kimi_800 dataset. With a context length of 32768 tokens, it is optimized for processing and generating extensive code sequences.

Loading preview...

Overview

March07/Qwen2-5-Coder-32B-sft-kimi-800 is a 32.8 billion parameter language model derived from the Qwen2.5-Coder-32B-Instruct architecture. It has been specifically fine-tuned using the kimi_800 dataset over 10 epochs, with a learning rate of 1e-05. This supervised fine-tuning process aims to enhance its capabilities in code-related applications.

Key Training Details

  • Base Model: Qwen/Qwen2.5-Coder-32B-Instruct
  • Fine-tuning Dataset: kimi_800
  • Parameters: 32.8 billion
  • Context Length: 32768 tokens
  • Learning Rate: 1e-05
  • Optimizer: AdamW with fused tensors (betas=(0.9, 0.999), epsilon=1e-08)
  • Epochs: 10
  • Batch Size: 1 (train), 8 (eval) with 4 gradient accumulation steps, resulting in a total effective batch size of 32.

Intended Use

This model is primarily intended for applications requiring advanced code understanding and generation, leveraging its specialized fine-tuning on a code-centric dataset. Its large context window makes it suitable for handling complex and lengthy codebases.