Overview
hadadxyz/Qwen3-8B-Ultra-Distilled is an 8 billion parameter language model, fine-tuned from the base Qwen/Qwen3-8B model. Its primary goal is to significantly improve the base model's ability to perform complex, step-by-step reasoning and broaden its general instruction-following capabilities. The model was trained using Supervised Fine Tuning (SFT) with Parameter Efficient Fine Tuning (PEFT) on a single NVIDIA A100 80GB GPU.
Key Capabilities
- Enhanced Reasoning: Trained on a curated dataset of reasoning traces distilled from advanced AI models like Claude Opus 4.6, Gemini 3 Pro, and GPT 5.2. This includes detailed chain-of-thought examples across mathematics, science, coding, and logic.
- Improved Instruction Following: Incorporates a diverse set of instruction-response pairs, including an "uncensored" dataset to reduce unnecessary refusals and improve engagement with a wider range of legitimate requests.
- Context Length: Supports a substantial context length of 40,960 tokens, allowing for processing and generating longer, more complex interactions.
- Efficient Training: Achieved its specialized capabilities with an estimated training time of 6-9 hours, demonstrating efficient resource utilization.
Good For
- Complex Problem Solving: Ideal for applications requiring the model to "think through" problems step-by-step before providing a final answer.
- Analytical Tasks: Excels in domains like mathematics, scientific inquiry, and coding where logical deduction and structured reasoning are crucial.
- Broad Instruction Adherence: Suitable for general-purpose instruction following, especially where nuanced understanding and reduced refusal rates are desired.