Name: charlie-li/Qwen3-4B-Instruct-2507-ScaleSWE-Distilled-Epoch3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: charlie-li

Model Overview

This model, charlie-li/Qwen3-4B-Instruct-2507-ScaleSWE-Distilled-Epoch3, is a 4 billion parameter instruction-tuned language model. It is a full-parameter fine-tune of the Qwen/Qwen3-4B-Instruct-2507 base model, specifically trained on a ScaleSWE-Distilled ShareGPT-format dataset. The fine-tuning process utilized LLaMA-Factory v1 SFT, employing FSDP2 with Ulysses context parallelism and a significant sequence cutoff of 71680 tokens.

Key Training Details

Base Model: Qwen3-4B-Instruct-2507
Fine-tuning Type: Full-parameter SFT (no PEFT/LoRA)
Dataset: scaleswe_distilled_sharegpt_format_nothink_tool_role_128k.v1_messages.jsonl
Context Length: Trained with a sequence cutoff of 71680 tokens, indicating strong performance with long contexts.
Precision: bf16 training for efficiency.
Epochs: 3 epochs, with a final logged loss of approximately 0.36.

Intended Use Cases

This model is particularly well-suited for tasks related to software engineering, given its fine-tuning on the ScaleSWE-Distilled dataset. Its large context window makes it effective for processing and generating code, understanding complex technical documentation, and assisting with various software development workflows.

Overview

Model Overview

Key Training Details

Intended Use Cases

Full Model Card (README)