hadadxyz/Qwen3-4B-Diversity

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 6, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Qwen3-4B-Diversity is a 4 billion parameter language model developed by hadadxyz, fine-tuned from Qwen/Qwen3-4B. It is trained on a diverse collection of high-quality reasoning datasets, combining knowledge distilled from various state-of-the-art AI systems. This model offers enhanced reasoning capabilities across mathematics, coding, general problem-solving, and multi-turn conversations, with a context length of 32768 tokens.

Loading preview...

Overview

Qwen3-4B-Diversity is a 4 billion parameter language model developed by hadadxyz, fine-tuned from the Qwen/Qwen3-4B base model. It was trained using supervised fine-tuning with parameter-efficient methods over 2 epochs, utilizing an A100-80GB GPU for approximately 17 hours. The model's training data comprises a diverse collection of over 24,000 high-quality reasoning examples distilled from various advanced AI systems, including Kimi K2.5, Qwen3.5, Claude Opus 4.6, Gemini 3 Pro, GPT-5.2, GLM-4.7, GLM-5, DeepSeek V3.2-Speciale, and GPT-5 Codex.

Key Capabilities

  • Advanced Reasoning: Excels at breaking down complex problems and providing detailed reasoning processes.
  • Mathematical Problem Solving: Enhanced capabilities for mathematical reasoning due to dedicated math-focused datasets.
  • Code Generation and Understanding: Improved coding abilities, benefiting from multiple code-reasoning datasets.
  • Multi-Turn Conversations: Better handles extended dialogues and maintains context-aware responses.
  • Domain Versatility: Offers flexibility across different domains and task types by integrating reasoning patterns from various AI systems.

Good For

  • Applications requiring strong logical deduction and problem-solving.
  • Tasks involving mathematical calculations and proofs.
  • Code generation, debugging, and understanding.
  • Building conversational agents that can maintain context over long interactions.