flammenai/FlameDesigner-Qwen2.5-3B-v1
FlameDesigner-Qwen2.5-3B-v1 is a 3.1 billion parameter instruction-tuned causal language model developed by flammenai, based on the Qwen2.5-3B-Instruct architecture. This model was fine-tuned using a Supervised Fine-Tuning (SFT) approach with a 32768 token context length. It is optimized for general language understanding and generation tasks, leveraging 4-bit quantization for efficient deployment.
Loading preview...
Model Overview
FlameDesigner-Qwen2.5-3B-v1 is a 3.1 billion parameter language model developed by flammenai, built upon the Qwen/Qwen2.5-3B-Instruct base model. It has undergone Supervised Fine-Tuning (SFT) to enhance its instruction-following capabilities and general performance.
Training Configuration Highlights
This model was trained with a focus on efficiency and performance, utilizing specific parameters:
- Base Model:
Qwen/Qwen2.5-3B-Instruct - Training Mode: Supervised Fine-Tuning (SFT)
- Max Sequence Length: 2048 tokens (during training)
- Quantization: 4-bit (NF4) for reduced memory footprint and faster inference.
- LoRA Parameters: Employed LoRA with a rank of 128 and alpha of 128, targeting key attention and feed-forward modules (
up_proj,down_proj,gate_proj,k_proj,q_proj,v_proj,o_proj). - Optimizer:
paged_adamw_8bitfor efficient memory usage during training.
Potential Use Cases
Given its instruction-tuned nature and efficient 3.1B parameter size, FlameDesigner-Qwen2.5-3B-v1 is suitable for a variety of applications where a balance between performance and resource consumption is crucial. It can be effectively used for:
- General text generation and completion.
- Instruction-following tasks.
- Chatbot development and conversational AI.
- Summarization and question-answering in resource-constrained environments.