kosa-labs/kosa-4B-it-v1

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 12, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

kosa-labs/kosa-4B-it-v1 is a 4 billion parameter instruction-tuned causal language model developed by Kosa Labs, built upon Qwen/Qwen3-4B-Instruct-2507. This model features a 32768 token context length and demonstrates significant performance improvements across reasoning and instruction-following benchmarks, including GSM8K, IFEval, ARC-Challenge, and MMLU. It is optimized for enhanced accuracy in complex problem-solving and general instruction adherence.

Loading preview...

kosa-4B-it-v1: Enhanced Instruction-Tuned Model

kosa-4B-it-v1 is a 4 billion parameter instruction-tuned model developed by Kosa Labs, an independent UK-based lab. It is built on the Qwen/Qwen3-4B-Instruct-2507 architecture, featuring a substantial 32768 token context length.

Key Capabilities & Performance

This model demonstrates notable improvements over its base model across several critical benchmarks, indicating enhanced reasoning and instruction-following abilities:

  • GSM8K (Mathematical Reasoning): Achieves 84.23% (strict) and 85.60% (flexible), significantly outperforming the base model's 73.24% and 79.15% respectively.
  • IFEval (Instruction Following): Shows strong performance with 85.77% (prompt strict) and 90.29% (instruction strict).
  • ARC-Challenge (Common Sense Reasoning): Improved to 52.13% (acc_norm) from 43.09%.
  • MMLU (General Knowledge): Reaches 65.76%, up from 61.89%.

Overall, kosa-4B-it-v1 achieves an average benchmark score of 77.30%, a substantial increase from the base model's 71.56%. These evaluations were conducted under identical settings using lm-evaluation-harness 0.4.12, vLLM, and bfloat16, with rigorous training data verification against benchmark test sets.

Usage & Availability

The model is readily available for use with the Hugging Face transformers library. GGUF quantizations (Q4_K_M, Q5_K_M, Q8_0) are also provided for efficient local deployment.