laion/Kimi-2.5-swesmith-r2egym-solved-maxeps-32k__Qwen3-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 28, 2026License:otherArchitecture:Transformer Cold

The laion/Kimi-2.5-swesmith-r2egym-solved-maxeps-32k__Qwen3-8B model is a fine-tuned variant of the Qwen3-8B architecture. This model is specifically trained on specialized datasets related to Kimi-2.5-swesmith and Kimi-2.5-r2egym sandboxes. It is optimized for tasks involving problem-solving within these specific environments, leveraging its base Qwen3-8B capabilities for enhanced performance in structured reasoning tasks.

Loading preview...

Model Overview

This model, laion/Kimi-2.5-swesmith-r2egym-solved-maxeps-32k__Qwen3-8B, is a fine-tuned iteration of the Qwen3-8B base model. It has undergone specialized training on two distinct datasets: penfever/Kimi-2.5-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-32k-reward1 and penfever/Kimi-2.5-r2egym_sandboxes-maxeps-32k-reward1. This fine-tuning process aims to adapt the model for specific problem-solving and reasoning tasks within the Kimi-2.5-swesmith and Kimi-2.5-r2egym sandbox environments.

Training Details

The training procedure involved a learning rate of 4e-05, a total batch size of 96 (with 3 gradient accumulation steps across 32 devices), and was run for 7 epochs. The optimizer used was ADAMW_TORCH_FUSED with specific beta and epsilon parameters. The training leveraged Transformers 4.57.6, Pytorch 2.9.1+cu130, Datasets 4.7.0, and Tokenizers 0.22.2.

Key Characteristics

  • Base Model: Qwen3-8B architecture.
  • Specialized Fine-tuning: Trained on datasets derived from Kimi-2.5-swesmith and Kimi-2.5-r2egym sandboxes.
  • Optimized for: Tasks related to these specific sandbox environments, likely involving structured problem-solving or code-related reasoning.

Intended Use Cases

Given its specialized training, this model is best suited for applications requiring performance within the Kimi-2.5-swesmith and Kimi-2.5-r2egym ecosystems. Developers working on tasks or research directly related to these sandbox environments would find this model particularly relevant.