TopologicalQwen: Topology-Aware Knowledge Distillation

TopologicalQwen is a 1.7 billion parameter model developed by Convergent Intelligence LLC, distilled from a 30B Qwen3 teacher using a novel Topological Knowledge Distillation (TKD) methodology. Unlike standard distillation that only captures smooth variations, TKD decomposes knowledge transfer into three channels: smooth distillation, jump corrections at conceptual boundaries, and drift corrections for gradual distributional shifts. This allows the student model to preserve the teacher's structural understanding, not just its surface statistics.

Key Capabilities & Features

Topology-Aware Distillation: Utilizes Discrepancy Calculus (DISC) to detect and preserve structural features (topic shifts, reasoning mode transitions) from the 30B teacher model.
DualMind Format: Trained to operate in a unique <explore> (derivation), <examine> (self-critique), and <response> (clean answer) cognitive loop, mimicking a dialectical reasoning process.
Physics CoT Optimization: Fine-tuned on Chain-of-Thought datasets for differential equations, theoretical mechanics, electromagnetism, and general relativity, making it proficient in complex scientific reasoning.
Efficient Reasoning: Achieves high structural reasoning quality at a significantly smaller 1.7B parameter count, demonstrating the effectiveness of TKD even with premium compute resources (H100, BF16).
Qwen3 Architecture: Built on the Qwen3ForCausalLM architecture with a 40,960 token context length and Grouped-Query Attention (GQA).

When to Use This Model

Complex Reasoning Tasks: Ideal for applications requiring structured, multi-step reasoning, especially in scientific or mathematical domains.
Cognitive Simulation: Useful for simulating internal thought processes (exploration, critique, synthesis) in AI agents.
Resource-Constrained Environments: Offers advanced reasoning capabilities in a compact 1.7B parameter size, suitable for deployment where larger models are impractical.
Research in Distillation: A prime example of advanced knowledge distillation techniques, demonstrating how structural information can be preserved beyond surface-level statistics.

Overview

TopologicalQwen: Topology-Aware Knowledge Distillation

Key Capabilities & Features

When to Use This Model

Full Model Card (README)