Model Overview
laion/Kimi-K2T-ling-coder-sft-sandboxes-1-maxeps-32k is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-8B architecture. This model has been specialized through fine-tuning on the penfever/Kimi-K2T-ling-coder-sft-sandboxes-1-maxeps-32k dataset, indicating a focus on coding and programming-related tasks. It supports a substantial context length of 32768 tokens, making it suitable for processing and generating longer code snippets or complex programming instructions.
Key Capabilities
- Code-centric Fine-tuning: Optimized for tasks related to code generation, completion, and understanding due to its specialized training data.
- Large Context Window: Benefits from a 32k token context length, allowing for handling extensive codebases or detailed technical specifications.
- Qwen3-8B Base: Inherits the foundational capabilities of the Qwen3-8B model, providing a strong base for its specialized performance.
Training Details
The model was trained with a learning rate of 4e-05 over 7.0 epochs, utilizing a total batch size of 16 across 8 GPUs. The optimizer used was ADAMW_TORCH_FUSED with a cosine learning rate scheduler and a warmup ratio of 0.1.
Good For
- Developers and researchers working on code generation and analysis.
- Applications requiring a model proficient in understanding and producing programming language constructs.
- Scenarios where a large context window is crucial for handling complex coding problems.