andresnowak/Qwen3-0.6B-instruction-finetuned

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:May 24, 2025Architecture:Transformer Warm

andresnowak/Qwen3-0.6B-instruction-finetuned is an 0.8 billion parameter instruction-tuned language model, fine-tuned from unsloth/Qwen3-0.6B-Base using TRL. It was trained with supervised instruction fine-tuning on a diverse mixture of datasets including code, math, and general instruction data, with a focus on robustness to varied question formats. This model is designed for general instruction-following tasks, demonstrating an overall accuracy of 37.8% on various benchmarks including MMLU and ARC Challenge.

Loading preview...

Overview

This model, andresnowak/Qwen3-0.6B-instruction-finetuned, is an 0.8 billion parameter instruction-tuned language model. It is a fine-tuned version of unsloth/Qwen3-0.6B-Base, developed by andresnowak using the TRL (Transformer Reinforcement Learning) library.

Key Capabilities

  • Instruction Following: Fine-tuned for general instruction-following tasks.
  • Robustness: Training included applying random templates to enhance robustness to diverse question phrasing, alongside a high-quality dataset.
  • Diverse Training Data: Utilizes a mixture of datasets covering code (CodeAlpaca, CodeV2), mathematics (OpenMathGsm8k, MathAlgebra, MathGrade, MathV5, TirMath), and general instruction data (NoRobots, FlanV2, IfData, Oasst1, Sciriff, TableGpt, WildChat).

Good For

  • General Purpose Chatbots: Suitable for applications requiring a small, instruction-tuned model to respond to a variety of prompts.
  • Educational Tools: Can be applied in scenarios requiring basic reasoning and problem-solving, given its training on math and science-related datasets.
  • Experimentation: A good candidate for researchers and developers looking to experiment with instruction-tuned models in the 0.8B parameter range, especially those interested in the impact of diverse dataset mixtures and template application during training.

Performance Highlights

The model achieved an Overall Accuracy of 37.8% across various benchmarks. Notable individual benchmark results include:

  • ARC Challenge: 46.0%
  • MMLU: 47.2%
  • GPQA: 29.9%
  • Math QA: 24.0%