Weni/WeniGPT-Mistral-7B-instructBase-4bit
Weni/WeniGPT-Mistral-7B-instructBase-4bit is a 7 billion parameter instruction-tuned language model developed by Weni, based on the Mistral architecture. This model is specifically trained with 4-bit quantization using bitsandbytes, employing nf4 quantization and double quantization for efficient deployment. It is optimized for tasks requiring a compact yet capable instruction-following model, making it suitable for resource-constrained environments.
Loading preview...
WeniGPT-Mistral-7B-instructBase-4bit Overview
WeniGPT-Mistral-7B-instructBase-4bit is an instruction-tuned language model developed by Weni, built upon the Mistral 7B architecture. This model is notable for its training with 4-bit quantization using the bitsandbytes library, specifically utilizing nf4 quantization and double quantization with bfloat16 compute dtype. This configuration aims to provide a balance between performance and memory efficiency.
Key Training Details
- Quantization Method:
bitsandbyteswithload_in_4bit: True. - Quantization Type:
nf4withbnb_4bit_use_double_quant: True. - Compute Dtype:
bfloat16for 4-bit operations. - Training Epochs: 10 epochs.
- Max Sequence Length: 2048 tokens.
- Optimizer:
adamw_torchwithconstant_with_warmuplearning rate scheduler. - Learning Rate: 4e-4.
- Gradient Accumulation: 2 steps.
- Gradient Checkpointing: Enabled for memory optimization.
Intended Use Cases
This model is well-suited for applications where a compact, instruction-following language model is required. Its 4-bit quantization makes it particularly efficient for deployment in environments with limited computational resources, while still retaining the instruction-following capabilities derived from its Mistral base. Developers can leverage this model for various NLP tasks that benefit from instruction tuning, such as text generation, summarization, and question answering, especially when memory footprint is a critical consideration.