Name: iamrahulreddy/Quintus API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: iamrahulreddy

Quintus-1.7B: A Distilled Reasoning Assistant

Quintus-1.7B is a compact, English-focused AI assistant developed by iamrahulreddy, leveraging the Qwen3-1.7B-Base architecture. Its core innovation lies in its training methodology: online full-vocabulary knowledge distillation from a more powerful Qwen3-8B teacher model. This process streams the teacher's complete vocabulary distribution live, providing a denser signal than traditional sparse top-k logit distillation.

Key Capabilities & Technical Highlights

Enhanced Reasoning: Quintus-1.7B demonstrates superior reasoning performance compared to its base and official 1.7B instruct models on benchmarks such as GSM8K, ARC-Challenge, and WinoGrande, despite its smaller size.
Efficient Distillation: The model employs a two-stage training pipeline: online KD followed by targeted Supervised Fine-Tuning (SFT) for assistant behavior, identity grounding, and generation stability.
Optimized Training: Features like assistant-only supervision masking, deterministic sequence packing (4096-token context), and the use of acceleration kernels (FlashAttention-2, Liger kernels) contribute to its efficiency.
Reusable Framework: The project is also designed as a reference pipeline for compact-model distillation, allowing adaptation to other teacher/student pairs.

Ideal Use Cases

Resource-constrained environments: Its 1.7B parameter count makes it suitable for deployment where computational resources are limited.
Applications requiring strong reasoning: Excels in tasks demanding logical inference and problem-solving, as indicated by its benchmark performance.
English-centric assistant applications: Optimized for generating precise and logically sound responses in English.
As a foundation for further fine-tuning: The distilled base provides a strong starting point for specialized assistant behaviors.

Overview

Quintus-1.7B: A Distilled Reasoning Assistant

Key Capabilities & Technical Highlights

Ideal Use Cases

Full Model Card (README)