reaperdoesntknow/TopologicalQwen
TopologicalQwen by Convergent Intelligence LLC is a 1.7 billion parameter Qwen3ForCausalLM model distilled from Qwen3-30B-A3B using Topological Knowledge Distillation (TKD). This methodology captures structural information like topic shifts and reasoning mode transitions, which standard distillation methods miss. It is optimized for complex reasoning tasks, particularly in physics, and operates in a unique DualMind format for enhanced cognitive processing.
Loading preview...
TopologicalQwen: Topology-Aware Knowledge Distillation
TopologicalQwen is a 1.7 billion parameter model developed by Convergent Intelligence LLC, distilled from a 30B Qwen3 teacher using a novel Topological Knowledge Distillation (TKD) methodology. Unlike standard distillation that only captures smooth variations, TKD decomposes knowledge transfer into three channels: smooth distillation, jump corrections at conceptual boundaries, and drift corrections for gradual distributional shifts. This allows the student model to preserve the teacher's structural understanding, not just its surface statistics.
Key Capabilities & Features
- Topology-Aware Distillation: Utilizes Discrepancy Calculus (DISC) to detect and preserve structural features (topic shifts, reasoning mode transitions) from the 30B teacher model.
- DualMind Format: Trained to operate in a unique
<explore>(derivation),<examine>(self-critique), and<response>(clean answer) cognitive loop, mimicking a dialectical reasoning process. - Physics CoT Optimization: Fine-tuned on Chain-of-Thought datasets for differential equations, theoretical mechanics, electromagnetism, and general relativity, making it proficient in complex scientific reasoning.
- Efficient Reasoning: Achieves high structural reasoning quality at a significantly smaller 1.7B parameter count, demonstrating the effectiveness of TKD even with premium compute resources (H100, BF16).
- Qwen3 Architecture: Built on the Qwen3ForCausalLM architecture with a 40,960 token context length and Grouped-Query Attention (GQA).
When to Use This Model
- Complex Reasoning Tasks: Ideal for applications requiring structured, multi-step reasoning, especially in scientific or mathematical domains.
- Cognitive Simulation: Useful for simulating internal thought processes (exploration, critique, synthesis) in AI agents.
- Resource-Constrained Environments: Offers advanced reasoning capabilities in a compact 1.7B parameter size, suitable for deployment where larger models are impractical.
- Research in Distillation: A prime example of advanced knowledge distillation techniques, demonstrating how structural information can be preserved beyond surface-level statistics.