reaperdoesntknow/DistilQwen3-1.7B-uncensored

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Mar 25, 2026Architecture:Transformer Warm

The reaperdoesntknow/DistilQwen3-1.7B-uncensored is a 1.7 billion parameter model from the DistilQwen3 series by Convergent Intelligence LLC: Research Division. This model is part of a proof-weighted distillation chain built on Discrepancy Calculus, a measure-theoretic framework designed to quantify local structural mismatch. It is specifically optimized for structural understanding and reasoning-critical tasks, leveraging a 30B-parameter teacher model and BF16 training on H100 hardware.

Loading preview...

Model Overview

reaperdoesntknow/DistilQwen3-1.7B-uncensored is a 1.7 billion parameter model developed by Convergent Intelligence LLC: Research Division, forming part of their DistilQwen3 series. This model is a product of a sophisticated proof-weighted knowledge distillation process, utilizing a 30B-parameter teacher model and trained on H100 hardware with BF16 precision.

Key Differentiators & Methodology

This model's core innovation lies in its foundation on Discrepancy Calculus (DISC), a measure-theoretic framework that decomposes the teacher's output distribution to identify and address local structural mismatches. Unlike standard KL divergence, DISC's operator quantifies these discrepancies, forcing the student model to prioritize structural understanding over surface-level pattern matching.

  • Proof-Weighted Distillation: The training methodology employs a unique 55% cross-entropy loss with decaying proof weights (2.5x to 1.5x) combined with 45% KL divergence at T=2.0. This amplifies loss on reasoning-critical tokens.
  • Hardware & Precision: While many models in the broader Convergent Intelligence portfolio were trained on CPU at FP32, the DistilQwen series, including this model, benefits from premium hardware (H100) and BF16 mixed precision training.
  • Uncensored Variant: This specific model is noted as an "uncensored" variant, suggesting fewer content restrictions compared to other models.

Use Cases

Given its specialized training in Discrepancy Calculus and proof-weighted distillation, this model is particularly well-suited for:

  • Tasks requiring structural understanding and logical inference.
  • Applications where reasoning-critical tokens are paramount.
  • Scenarios benefiting from a model trained to allocate capacity to structural understanding rather than just surface-level patterns.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p