reaperdoesntknow/TopologicalQwen

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Mar 28, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

TopologicalQwen is a 1.7 billion parameter Qwen3ForCausalLM model developed by Convergent Intelligence LLC: Research Division. It is distilled from Qwen3-30B-A3B using Topological Knowledge Distillation (TKD), a novel methodology that captures structural information beyond standard KL divergence. With a 40,960-token context length, it excels at complex reasoning tasks, particularly in physics, by learning a cognitive loop of derivation, self-critique, and synthesis.

Loading preview...

TopologicalQwen: Topology-Aware Knowledge Distillation

TopologicalQwen is a 1.7 billion parameter model from Convergent Intelligence LLC: Research Division, distilled from a 30B Qwen3 teacher using Topological Knowledge Distillation (TKD). Unlike traditional methods that only capture smooth variations, TKD decomposes knowledge transfer into three channels: smooth distillation, jump corrections at conceptual boundaries, and drift corrections for subtle distributional shifts. This allows the model to preserve the teacher's structural understanding, not just surface statistics.

Key Capabilities & Features

  • Topology-Aware Reasoning: Employs a unique methodology to detect and preserve structural information (topic shifts, reasoning mode transitions) during distillation, leading to enhanced reasoning quality at a small scale.
  • DualMind Format: Trained to generate responses in a structured <explore> (derivation), <examine> (self-critique), and <response> (clean answer) format, mimicking a cognitive loop for complex problem-solving.
  • Physics CoT Optimization: Fine-tuned on Chain-of-Thought datasets for differential equations, theoretical mechanics, electromagnetism, and general relativity, making it proficient in scientific reasoning.
  • Efficient Architecture: A 1.7B parameter Qwen3ForCausalLM model with a 40,960-token context length, offering strong performance for its size.

What Makes It Different

TopologicalQwen represents a significant advancement in knowledge distillation, demonstrating that a 1.7B model can achieve structural reasoning quality typically associated with much larger models. This is attributed to the Discrepancy Calculus (DISC) framework, which mathematically identifies and preserves critical structural features in the teacher's output distribution. The model's training on premium hardware (Colab H100) further showcases the potential of the TKD pipeline when combined with robust compute resources, yielding results that defy typical parameter-count expectations.