nicolomonti/qwen3-1.7b-1bit-align-ce-sft

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 27, 2026Architecture:Transformer Warm

The nicolomonti/qwen3-1.7b-1bit-align-ce-sft model is a 2 billion parameter Qwen3-based language model, fine-tuned using a merge-preserving 1-bit adapter. It was developed by nicolomonti with a focus on supervised fine-tuning using cross-entropy loss. This model is optimized for efficient deployment and performance, leveraging 1-bit quantization for reduced memory footprint and faster inference.

Loading preview...

Model Overview

The nicolomonti/qwen3-1.7b-1bit-align-ce-sft is a 2 billion parameter language model built upon the Qwen3 architecture. Its key differentiator is the application of a merge-preserving 1-bit adapter during supervised fine-tuning (SFT), originating from nicolomonti/otfq_opd_deepscaler_batman_1_7b_original. This approach aims to achieve significant efficiency gains through 1-bit quantization while maintaining performance.

Key Characteristics

  • 1-bit Quantization: Utilizes a merge-preserving 1-bit adapter for q_proj, v_proj, o_proj, gate_proj, up_proj, and down_proj layers, enhancing efficiency.
  • Supervised Fine-Tuning (SFT): Trained exclusively with cross-entropy loss, without distillation or mixed loss functions.
  • Training Data: Fine-tuned on a filtered dataset derived from the CE branch of a local alignment pipeline, including bonsai_identity_translated.jsonl, identity_self_cognition_llamafactory_bonsai_messages_translated.jsonl, and china_dealignment_nosys.jsonl.
  • Exact Eval Parity: Verification confirmed exact held-out evaluation loss parity between the adapter, materialized, and merged models, indicating successful integration of the 1-bit adapter.
  • Context Length: Supports a maximum sequence length of 2048 tokens during training.

Potential Use Cases

  • Resource-Constrained Environments: Ideal for applications requiring a smaller memory footprint and faster inference due to its 1-bit quantization.
  • Efficient Deployment: Suitable for edge devices or scenarios where computational resources are limited.
  • Fine-tuned Language Generation: Can be used for tasks requiring general language understanding and generation, benefiting from its SFT on diverse datasets.