activeDap/Llama-3.2-3B_ultrafeedback_chosen
activeDap/Llama-3.2-3B_ultrafeedback_chosen is a 3.2 billion parameter causal language model fine-tuned by activeDap based on the Meta Llama-3.2-3B architecture. This model was specifically fine-tuned using the activeDap/ultrafeedback_chosen dataset, optimizing its performance for generating high-quality, preferred responses in a prompt-completion format. It is designed for tasks requiring refined conversational or instructional outputs, leveraging supervised fine-tuning with assistant-only loss.
Loading preview...
Model Overview
This model, activeDap/Llama-3.2-3B_ultrafeedback_chosen, is a specialized fine-tuned version of the Meta Llama-3.2-3B base model. It has 3.2 billion parameters and was trained by activeDap using the activeDap/ultrafeedback_chosen dataset, which focuses on high-quality, preferred responses.
Key Capabilities
- Instruction Following: Enhanced ability to follow instructions and generate relevant, high-quality responses due to fine-tuning on a curated feedback dataset.
- Prompt-Completion Generation: Optimized for generating completions based on given prompts, with a focus on assistant-style outputs.
- Efficient Performance: As a 3.2 billion parameter model, it offers a balance between performance and computational efficiency, suitable for various deployment scenarios.
Training Details
The model underwent Supervised Fine-Tuning (SFT) using the Transformers and TRL libraries. Training involved 808 steps, achieving a final training loss of 1.4808. It utilized a total batch size of 64 across 4 GPUs, with a learning rate of 2e-05 and a maximum sequence length of 512 tokens. The training process specifically applied assistant-only loss, directing the model to improve its response generation quality.