CoNDeNse-AI/GLM-5.1-Qwen3-1.7B-CoNDeNse

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:May 16, 2026Architecture:Transformer0.0K Warm

CoNDeNse-AI/GLM-5.1-Qwen3-1.7B-CoNDeNse is a Qwen3-1.7B based language model developed by CoNDeNse-AI, fine-tuned using LoRA for reasoning capabilities. This model is part of the CoNDeNse project, which focuses on compressing the reasoning abilities of larger models into smaller, more deployable ones. It was trained on the Jackrong/GLM-5.1-Reasoning-1M-Cleaned dataset, making it suitable for tasks requiring enhanced logical inference and problem-solving.

Loading preview...

Model Overview

CoNDeNse-AI/GLM-5.1-Qwen3-1.7B-CoNDeNse is a specialized language model developed by CoNDeNse-AI, built upon the Qwen/Qwen3-1.7B base architecture. This model is a key component of the CoNDeNse project, which aims to distill the complex reasoning capabilities of large language models into more compact and efficient forms suitable for deployment.

Key Characteristics

  • Base Model: Qwen/Qwen3-1.7B, a 1.7 billion parameter model.
  • Fine-tuning Method: Utilizes LoRA (Low-Rank Adaptation) with specific target modules including q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, and down_proj.
  • Training Dataset: Fine-tuned on the Jackrong/GLM-5.1-Reasoning-1M-Cleaned dataset, comprising 75,000 examples, specifically designed to enhance reasoning skills.
  • Training Configuration: Employed AdamW 8-bit optimizer, a learning rate of 2e-4 with a cosine scheduler, and an effective batch size of 16. The maximum sequence length supported during training was 4096 tokens with packing enabled.

Intended Use Cases

This model is particularly well-suited for applications where reasoning and logical inference are critical, especially in environments requiring smaller, more efficient models. Its training on a dedicated reasoning dataset suggests improved performance on tasks that demand structured thought processes. Users should be aware that, like many language models, it may hallucinate and should be used with appropriate safeguards.