laion/sft__Kimi-2-5-swesmith-oracle-maxeps-32k__Qwen3-8B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 27, 2026License:otherArchitecture:Transformer Warm

This model is a fine-tuned 8 billion parameter Qwen3-8B causal language model, developed by Qwen and further fine-tuned by laion/sft. It is trained with a 32k token context length. The model is fine-tuned on a specific dataset related to 'Kimi-2.5-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-32k', suggesting a specialization in tasks related to code or structured problem-solving environments.

Loading preview...

Model Overview

This model, sft__Kimi-2-5-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-32k__40-0__Qwen3-8B, is a specialized fine-tuned version of the Qwen3-8B base model, developed by Qwen. It features 8 billion parameters and supports a substantial context length of 32,768 tokens, making it suitable for processing longer inputs and maintaining conversational coherence over extended interactions.

Training Details

The model was fine-tuned using a learning rate of 4e-05 over 7 epochs, with a total training batch size of 96 across 32 GPUs. The training utilized the AdamW_TORCH_FUSED optimizer and a cosine learning rate scheduler with a 0.1 warmup ratio. This configuration suggests a focus on robust and efficient training to adapt the base model to specific tasks.

Potential Use Cases

Given its fine-tuning on the Kimi-2.5-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-32k dataset, this model is likely optimized for tasks involving:

  • Code analysis and generation: The dataset name implies interaction with sandboxed environments and tests, which are common in software development.
  • Problem-solving in structured environments: 'Oracle verified' and 'maxeps' suggest a focus on tasks requiring precise, verifiable outputs, potentially in technical or logical domains.
  • Extended context understanding: The 32k context length is beneficial for handling complex problem descriptions, multi-file codebases, or lengthy technical documentation.