charlie-li/Qwen3-8B-ScaleSWE-Distilled-Full-SFT

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 28, 2026License:otherArchitecture:Transformer Cold

The charlie-li/Qwen3-8B-ScaleSWE-Distilled-Full-SFT model is an 8 billion parameter causal language model, fine-tuned from Qwen/Qwen3-8B. It was trained on the qwopus_mix3_scaleswe_data_distilled dataset, suggesting an optimization for specific tasks related to software engineering (SWE) or similar domains. This model is designed for applications requiring a capable language model with a 32K context length, potentially excelling in tasks aligned with its specialized training data.

Loading preview...

Overview

This model, charlie-li/Qwen3-8B-ScaleSWE-Distilled-Full-SFT, is an 8 billion parameter language model built upon the Qwen/Qwen3-8B architecture. It has been specifically fine-tuned using the qwopus_mix3_scaleswe_data_distilled dataset, indicating a specialized focus, likely within software engineering or related technical domains. The model supports a substantial context length of 32,768 tokens.

Training Details

The fine-tuning process involved a learning rate of 2e-05, a batch size of 1 per device across 8 GPUs (totaling 8), and was trained for 2 epochs. The optimizer used was ADAMW_TORCH_FUSED with standard betas and epsilon, and a linear learning rate scheduler with a 0.05 warmup ratio. This configuration suggests a robust fine-tuning approach aimed at leveraging the specialized dataset effectively.

Potential Use Cases

Given its fine-tuning on a dataset related to "ScaleSWE," this model is likely optimized for tasks such as:

  • Code generation and completion: Assisting developers with writing code.
  • Software engineering tasks: Potentially understanding and generating documentation, bug fixing, or code refactoring suggestions.
  • Technical question answering: Providing informed responses within software development contexts.

Further details on specific capabilities and limitations would require more information on the qwopus_mix3_scaleswe_data_distilled dataset and evaluation results.