charlie-li/Qwen3-4B-Instruct-2507-ScaleSWE-Distilled-Epoch3
The charlie-li/Qwen3-4B-Instruct-2507-ScaleSWE-Distilled-Epoch3 is a 4 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen3-4B-Instruct-2507. This model was trained using full-parameter SFT on a ScaleSWE-Distilled ShareGPT-format dataset, focusing on software engineering tasks. It features a notable sequence cutoff of 71680 tokens, making it suitable for applications requiring extended context understanding in software development.
Loading preview...
Model Overview
This model, charlie-li/Qwen3-4B-Instruct-2507-ScaleSWE-Distilled-Epoch3, is a 4 billion parameter instruction-tuned language model. It is a full-parameter fine-tune of the Qwen/Qwen3-4B-Instruct-2507 base model, specifically trained on a ScaleSWE-Distilled ShareGPT-format dataset. The fine-tuning process utilized LLaMA-Factory v1 SFT, employing FSDP2 with Ulysses context parallelism and a significant sequence cutoff of 71680 tokens.
Key Training Details
- Base Model: Qwen3-4B-Instruct-2507
- Fine-tuning Type: Full-parameter SFT (no PEFT/LoRA)
- Dataset:
scaleswe_distilled_sharegpt_format_nothink_tool_role_128k.v1_messages.jsonl - Context Length: Trained with a sequence cutoff of 71680 tokens, indicating strong performance with long contexts.
- Precision: bf16 training for efficiency.
- Epochs: 3 epochs, with a final logged loss of approximately 0.36.
Intended Use Cases
This model is particularly well-suited for tasks related to software engineering, given its fine-tuning on the ScaleSWE-Distilled dataset. Its large context window makes it effective for processing and generating code, understanding complex technical documentation, and assisting with various software development workflows.