WeniGPT-Mistral-7B-instructBase-4bit Overview
WeniGPT-Mistral-7B-instructBase-4bit is an instruction-tuned language model developed by Weni, built upon the Mistral 7B architecture. This model is notable for its training with 4-bit quantization using the bitsandbytes library, specifically utilizing nf4 quantization and double quantization with bfloat16 compute dtype. This configuration aims to provide a balance between performance and memory efficiency.
Key Training Details
- Quantization Method:
bitsandbytes with load_in_4bit: True. - Quantization Type:
nf4 with bnb_4bit_use_double_quant: True. - Compute Dtype:
bfloat16 for 4-bit operations. - Training Epochs: 10 epochs.
- Max Sequence Length: 2048 tokens.
- Optimizer:
adamw_torch with constant_with_warmup learning rate scheduler. - Learning Rate: 4e-4.
- Gradient Accumulation: 2 steps.
- Gradient Checkpointing: Enabled for memory optimization.
Intended Use Cases
This model is well-suited for applications where a compact, instruction-following language model is required. Its 4-bit quantization makes it particularly efficient for deployment in environments with limited computational resources, while still retaining the instruction-following capabilities derived from its Mistral base. Developers can leverage this model for various NLP tasks that benefit from instruction tuning, such as text generation, summarization, and question answering, especially when memory footprint is a critical consideration.