March07/Qwen2-5-Coder-32B-sft-kimi-800
March07/Qwen2-5-Coder-32B-sft-kimi-800 is a 32.8 billion parameter language model, fine-tuned from Qwen/Qwen2.5-Coder-32B-Instruct. This model specializes in code-related tasks, having undergone supervised fine-tuning on the kimi_800 dataset. With a context length of 32768 tokens, it is optimized for processing and generating extensive code sequences.
Loading preview...
Overview
March07/Qwen2-5-Coder-32B-sft-kimi-800 is a 32.8 billion parameter language model derived from the Qwen2.5-Coder-32B-Instruct architecture. It has been specifically fine-tuned using the kimi_800 dataset over 10 epochs, with a learning rate of 1e-05. This supervised fine-tuning process aims to enhance its capabilities in code-related applications.
Key Training Details
- Base Model: Qwen/Qwen2.5-Coder-32B-Instruct
- Fine-tuning Dataset: kimi_800
- Parameters: 32.8 billion
- Context Length: 32768 tokens
- Learning Rate: 1e-05
- Optimizer: AdamW with fused tensors (betas=(0.9, 0.999), epsilon=1e-08)
- Epochs: 10
- Batch Size: 1 (train), 8 (eval) with 4 gradient accumulation steps, resulting in a total effective batch size of 32.
Intended Use
This model is primarily intended for applications requiring advanced code understanding and generation, leveraging its specialized fine-tuning on a code-centric dataset. Its large context window makes it suitable for handling complex and lengthy codebases.