laion/Kimi-2.5-swesmith-r2egym-solved-maxeps-32k__Qwen3-8B
The laion/Kimi-2.5-swesmith-r2egym-solved-maxeps-32k__Qwen3-8B model is a fine-tuned variant of the Qwen3-8B architecture. This model is specifically trained on specialized datasets related to Kimi-2.5-swesmith and Kimi-2.5-r2egym sandboxes. It is optimized for tasks involving problem-solving within these specific environments, leveraging its base Qwen3-8B capabilities for enhanced performance in structured reasoning tasks.
Loading preview...
Model Overview
This model, laion/Kimi-2.5-swesmith-r2egym-solved-maxeps-32k__Qwen3-8B, is a fine-tuned iteration of the Qwen3-8B base model. It has undergone specialized training on two distinct datasets: penfever/Kimi-2.5-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-32k-reward1 and penfever/Kimi-2.5-r2egym_sandboxes-maxeps-32k-reward1. This fine-tuning process aims to adapt the model for specific problem-solving and reasoning tasks within the Kimi-2.5-swesmith and Kimi-2.5-r2egym sandbox environments.
Training Details
The training procedure involved a learning rate of 4e-05, a total batch size of 96 (with 3 gradient accumulation steps across 32 devices), and was run for 7 epochs. The optimizer used was ADAMW_TORCH_FUSED with specific beta and epsilon parameters. The training leveraged Transformers 4.57.6, Pytorch 2.9.1+cu130, Datasets 4.7.0, and Tokenizers 0.22.2.
Key Characteristics
- Base Model: Qwen3-8B architecture.
- Specialized Fine-tuning: Trained on datasets derived from Kimi-2.5-swesmith and Kimi-2.5-r2egym sandboxes.
- Optimized for: Tasks related to these specific sandbox environments, likely involving structured problem-solving or code-related reasoning.
Intended Use Cases
Given its specialized training, this model is best suited for applications requiring performance within the Kimi-2.5-swesmith and Kimi-2.5-r2egym ecosystems. Developers working on tasks or research directly related to these sandbox environments would find this model particularly relevant.