Model Overview
The nicolomonti/qwen3-1.7b-1bit-align-ce-sft is a 2 billion parameter language model built upon the Qwen3 architecture. Its key differentiator is the application of a merge-preserving 1-bit adapter during supervised fine-tuning (SFT), originating from nicolomonti/otfq_opd_deepscaler_batman_1_7b_original. This approach aims to achieve significant efficiency gains through 1-bit quantization while maintaining performance.
Key Characteristics
- 1-bit Quantization: Utilizes a merge-preserving 1-bit adapter for
q_proj, v_proj, o_proj, gate_proj, up_proj, and down_proj layers, enhancing efficiency. - Supervised Fine-Tuning (SFT): Trained exclusively with cross-entropy loss, without distillation or mixed loss functions.
- Training Data: Fine-tuned on a filtered dataset derived from the CE branch of a local alignment pipeline, including
bonsai_identity_translated.jsonl, identity_self_cognition_llamafactory_bonsai_messages_translated.jsonl, and china_dealignment_nosys.jsonl. - Exact Eval Parity: Verification confirmed exact held-out evaluation loss parity between the adapter, materialized, and merged models, indicating successful integration of the 1-bit adapter.
- Context Length: Supports a maximum sequence length of 2048 tokens during training.
Potential Use Cases
- Resource-Constrained Environments: Ideal for applications requiring a smaller memory footprint and faster inference due to its 1-bit quantization.
- Efficient Deployment: Suitable for edge devices or scenarios where computational resources are limited.
- Fine-tuned Language Generation: Can be used for tasks requiring general language understanding and generation, benefiting from its SFT on diverse datasets.