charlie-li/Qwen3-8B-ScaleSWE-Distilled-Full-SFT
The charlie-li/Qwen3-8B-ScaleSWE-Distilled-Full-SFT model is an 8 billion parameter causal language model, fine-tuned from Qwen/Qwen3-8B. It was trained on the qwopus_mix3_scaleswe_data_distilled dataset, suggesting an optimization for specific tasks related to software engineering (SWE) or similar domains. This model is designed for applications requiring a capable language model with a 32K context length, potentially excelling in tasks aligned with its specialized training data.
Loading preview...
Overview
This model, charlie-li/Qwen3-8B-ScaleSWE-Distilled-Full-SFT, is an 8 billion parameter language model built upon the Qwen/Qwen3-8B architecture. It has been specifically fine-tuned using the qwopus_mix3_scaleswe_data_distilled dataset, indicating a specialized focus, likely within software engineering or related technical domains. The model supports a substantial context length of 32,768 tokens.
Training Details
The fine-tuning process involved a learning rate of 2e-05, a batch size of 1 per device across 8 GPUs (totaling 8), and was trained for 2 epochs. The optimizer used was ADAMW_TORCH_FUSED with standard betas and epsilon, and a linear learning rate scheduler with a 0.05 warmup ratio. This configuration suggests a robust fine-tuning approach aimed at leveraging the specialized dataset effectively.
Potential Use Cases
Given its fine-tuning on a dataset related to "ScaleSWE," this model is likely optimized for tasks such as:
- Code generation and completion: Assisting developers with writing code.
- Software engineering tasks: Potentially understanding and generating documentation, bug fixing, or code refactoring suggestions.
- Technical question answering: Providing informed responses within software development contexts.
Further details on specific capabilities and limitations would require more information on the qwopus_mix3_scaleswe_data_distilled dataset and evaluation results.