penfever/kimi-k2-swesmith_with_plain_docker-sandboxes-maxeps-32k
The penfever/kimi-k2-swesmith_with_plain_docker-sandboxes-maxeps-32k model is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. It features a 32,768 token context length, making it suitable for processing extensive inputs. This model is specifically fine-tuned on the penfever/kimi-k2-swesmith_with_plain_docker-sandboxes-maxeps-32k dataset, indicating a specialized application focus. Its architecture and training suggest potential for tasks requiring deep contextual understanding over long sequences.
Loading preview...
Model Overview
This model, penfever/kimi-k2-swesmith_with_plain_docker-sandboxes-maxeps-32k, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has been fine-tuned on a specific dataset, also named penfever/kimi-k2-swesmith_with_plain_docker-sandboxes-maxeps-32k, suggesting a specialized application or domain. A notable feature is its substantial context window of 32,768 tokens, enabling it to process and generate responses based on very long input sequences.
Training Details
The model underwent training with a learning rate of 4e-05, utilizing a multi-GPU setup with 8 devices and a total batch size of 16 (achieved with a train_batch_size of 1 and gradient_accumulation_steps of 2). The optimizer used was ADAMW_TORCH_FUSED with specific beta and epsilon values, and a cosine learning rate scheduler with a 0.1 warmup ratio was applied over 7 epochs. The training leveraged Transformers 4.57.3, Pytorch 2.9.0+cu128, Datasets 4.4.1, and Tokenizers 0.22.1.
Key Characteristics
- Base Model: Qwen/Qwen3-8B
- Parameter Count: 8 billion
- Context Length: 32,768 tokens
- Fine-tuning: Specialized on the
penfever/kimi-k2-swesmith_with_plain_docker-sandboxes-maxeps-32kdataset.
Potential Use Cases
Given its large context window and fine-tuned nature, this model is likely suitable for applications requiring extensive document analysis, long-form content generation, or tasks where understanding deep contextual relationships across many tokens is crucial. Its specific fine-tuning dataset implies a focus on a particular domain, which would be its primary strength.