Name: charlie-li/Qwen3-8B-ScaleSWE-Distilled-Full-SFT API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: charlie-li

Overview

This model, charlie-li/Qwen3-8B-ScaleSWE-Distilled-Full-SFT, is an 8 billion parameter language model built upon the Qwen/Qwen3-8B architecture. It has been specifically fine-tuned using the qwopus_mix3_scaleswe_data_distilled dataset, indicating a specialized focus, likely within software engineering or related technical domains. The model supports a substantial context length of 32,768 tokens.

Training Details

The fine-tuning process involved a learning rate of 2e-05, a batch size of 1 per device across 8 GPUs (totaling 8), and was trained for 2 epochs. The optimizer used was ADAMW_TORCH_FUSED with standard betas and epsilon, and a linear learning rate scheduler with a 0.05 warmup ratio. This configuration suggests a robust fine-tuning approach aimed at leveraging the specialized dataset effectively.

Potential Use Cases

Given its fine-tuning on a dataset related to "ScaleSWE," this model is likely optimized for tasks such as:

Code generation and completion: Assisting developers with writing code.
Software engineering tasks: Potentially understanding and generating documentation, bug fixing, or code refactoring suggestions.
Technical question answering: Providing informed responses within software development contexts.

Further details on specific capabilities and limitations would require more information on the qwopus_mix3_scaleswe_data_distilled dataset and evaluation results.

Overview

Overview

Training Details

Potential Use Cases

Full Model Card (README)