iamrahulreddy/Quintus

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 13, 2026License:mitArchitecture:Transformer Open Weights Cold

Quintus-1.7B is a compact 1.7 billion parameter English-focused assistant model developed by iamrahulreddy, built upon Qwen/Qwen3-1.7B-Base. It utilizes online full-vocabulary knowledge distillation from a Qwen3-8B teacher model, followed by targeted supervised fine-tuning. This approach enables it to achieve strong reasoning capabilities, outperforming both its base and official 1.7B instruct counterparts on benchmarks like GSM8K, ARC-Challenge, and WinoGrande.

Loading preview...

Quintus-1.7B: A Distilled Reasoning Assistant

Quintus-1.7B is a compact, English-focused AI assistant developed by iamrahulreddy, leveraging the Qwen3-1.7B-Base architecture. Its core innovation lies in its training methodology: online full-vocabulary knowledge distillation from a more powerful Qwen3-8B teacher model. This process streams the teacher's complete vocabulary distribution live, providing a denser signal than traditional sparse top-k logit distillation.

Key Capabilities & Technical Highlights

  • Enhanced Reasoning: Quintus-1.7B demonstrates superior reasoning performance compared to its base and official 1.7B instruct models on benchmarks such as GSM8K, ARC-Challenge, and WinoGrande, despite its smaller size.
  • Efficient Distillation: The model employs a two-stage training pipeline: online KD followed by targeted Supervised Fine-Tuning (SFT) for assistant behavior, identity grounding, and generation stability.
  • Optimized Training: Features like assistant-only supervision masking, deterministic sequence packing (4096-token context), and the use of acceleration kernels (FlashAttention-2, Liger kernels) contribute to its efficiency.
  • Reusable Framework: The project is also designed as a reference pipeline for compact-model distillation, allowing adaptation to other teacher/student pairs.

Ideal Use Cases

  • Resource-constrained environments: Its 1.7B parameter count makes it suitable for deployment where computational resources are limited.
  • Applications requiring strong reasoning: Excels in tasks demanding logical inference and problem-solving, as indicated by its benchmark performance.
  • English-centric assistant applications: Optimized for generating precise and logically sound responses in English.
  • As a foundation for further fine-tuning: The distilled base provides a strong starting point for specialized assistant behaviors.