nicolomonti/qwen3-1.7b-1bit-align-ce-sft
The nicolomonti/qwen3-1.7b-1bit-align-ce-sft model is a 2 billion parameter Qwen3-based language model, fine-tuned using a merge-preserving 1-bit adapter. It was developed by nicolomonti with a focus on supervised fine-tuning using cross-entropy loss. This model is optimized for efficient deployment and performance, leveraging 1-bit quantization for reduced memory footprint and faster inference.
Loading preview...
Model Overview
The nicolomonti/qwen3-1.7b-1bit-align-ce-sft is a 2 billion parameter language model built upon the Qwen3 architecture. Its key differentiator is the application of a merge-preserving 1-bit adapter during supervised fine-tuning (SFT), originating from nicolomonti/otfq_opd_deepscaler_batman_1_7b_original. This approach aims to achieve significant efficiency gains through 1-bit quantization while maintaining performance.
Key Characteristics
- 1-bit Quantization: Utilizes a merge-preserving 1-bit adapter for
q_proj,v_proj,o_proj,gate_proj,up_proj, anddown_projlayers, enhancing efficiency. - Supervised Fine-Tuning (SFT): Trained exclusively with cross-entropy loss, without distillation or mixed loss functions.
- Training Data: Fine-tuned on a filtered dataset derived from the CE branch of a local alignment pipeline, including
bonsai_identity_translated.jsonl,identity_self_cognition_llamafactory_bonsai_messages_translated.jsonl, andchina_dealignment_nosys.jsonl. - Exact Eval Parity: Verification confirmed exact held-out evaluation loss parity between the adapter, materialized, and merged models, indicating successful integration of the 1-bit adapter.
- Context Length: Supports a maximum sequence length of 2048 tokens during training.
Potential Use Cases
- Resource-Constrained Environments: Ideal for applications requiring a smaller memory footprint and faster inference due to its 1-bit quantization.
- Efficient Deployment: Suitable for edge devices or scenarios where computational resources are limited.
- Fine-tuned Language Generation: Can be used for tasks requiring general language understanding and generation, benefiting from its SFT on diverse datasets.