laion/sft__Kimi-2-5-inferredbugs-sandboxes-maxeps-32k__Qwen3-8B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 14, 2026License:otherArchitecture:Transformer Warm

This model is a fine-tuned version of Qwen/Qwen3-8B, developed by Qwen, specifically adapted from the Qwen3-8B architecture. It was fine-tuned on the Kimi-2.5-inferredbugs-sandboxes-maxeps-32k dataset. The model's specific capabilities and intended uses require further information, as the provided details are limited.

Loading preview...

Model Overview

This model is a fine-tuned iteration of the Qwen3-8B base model, originally developed by Qwen. The fine-tuning process utilized the /e/data1/datasets/playground/ot/hf_hub/datasets--penfever--Kimi-2.5-inferredbugs-sandboxes-maxeps-32k/snapshots/b8446f4e9f3c6a0a77d8866517b15b4ddeb7647d_thinking_preprocessed dataset.

Training Details

The fine-tuning procedure involved specific hyperparameters:

  • Learning Rate: 4e-05
  • Batch Sizes: train_batch_size of 1, eval_batch_size of 8
  • Gradient Accumulation: 3 steps, leading to a total_train_batch_size of 96
  • Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.98) and epsilon=1e-08
  • LR Scheduler: Cosine type with a warmup ratio of 0.1
  • Epochs: 7.0
  • Distributed Training: Multi-GPU setup across 32 devices.

Current Status

Further information regarding the model's specific description, intended uses, limitations, and detailed training/evaluation data is currently pending. Users should consult future updates for a comprehensive understanding of its capabilities and optimal application scenarios.