iamrahulreddy/Quintus
Quintus-1.7B is a compact 1.7 billion parameter English-focused assistant model developed by iamrahulreddy, built upon Qwen/Qwen3-1.7B-Base. It utilizes online full-vocabulary knowledge distillation from a Qwen3-8B teacher model, followed by targeted supervised fine-tuning. This approach enables it to achieve strong reasoning capabilities, outperforming both its base and official 1.7B instruct counterparts on benchmarks like GSM8K, ARC-Challenge, and WinoGrande.
Loading preview...
Quintus-1.7B: A Distilled Reasoning Assistant
Quintus-1.7B is a compact, English-focused AI assistant developed by iamrahulreddy, leveraging the Qwen3-1.7B-Base architecture. Its core innovation lies in its training methodology: online full-vocabulary knowledge distillation from a more powerful Qwen3-8B teacher model. This process streams the teacher's complete vocabulary distribution live, providing a denser signal than traditional sparse top-k logit distillation.
Key Capabilities & Technical Highlights
- Enhanced Reasoning: Quintus-1.7B demonstrates superior reasoning performance compared to its base and official 1.7B instruct models on benchmarks such as GSM8K, ARC-Challenge, and WinoGrande, despite its smaller size.
- Efficient Distillation: The model employs a two-stage training pipeline: online KD followed by targeted Supervised Fine-Tuning (SFT) for assistant behavior, identity grounding, and generation stability.
- Optimized Training: Features like assistant-only supervision masking, deterministic sequence packing (4096-token context), and the use of acceleration kernels (FlashAttention-2, Liger kernels) contribute to its efficiency.
- Reusable Framework: The project is also designed as a reference pipeline for compact-model distillation, allowing adaptation to other teacher/student pairs.
Ideal Use Cases
- Resource-constrained environments: Its 1.7B parameter count makes it suitable for deployment where computational resources are limited.
- Applications requiring strong reasoning: Excels in tasks demanding logical inference and problem-solving, as indicated by its benchmark performance.
- English-centric assistant applications: Optimized for generating precise and logically sound responses in English.
- As a foundation for further fine-tuning: The distilled base provides a strong starting point for specialized assistant behaviors.