phanviethoang1512/llama3.2-1b-deita-dpo-student_sft_init
The phanviethoang1512/llama3.2-1b-deita-dpo-student_sft_init is a 1 billion parameter causal language model, fine-tuned from Meta's Llama-3.2-1B. It was trained on the HuggingFaceH4/deita-10k-v0-sft dataset, demonstrating a validation loss of 1.1767. This model is designed for general language generation tasks, building upon its Llama-3.2 base with further instruction-following capabilities.
Loading preview...
Model Overview
This model, phanviethoang1512/llama3.2-1b-deita-dpo-student_sft_init, is a 1 billion parameter language model derived from Meta's Llama-3.2-1B architecture. It has been fine-tuned using the HuggingFaceH4/deita-10k-v0-sft dataset, aiming to enhance its instruction-following and general language generation capabilities.
Training Details
The model underwent 3 epochs of training with a learning rate of 2e-05 and a total batch size of 64 (achieved with a train_batch_size of 4 and gradient_accumulation_steps of 4). The optimizer used was AdamW with a cosine learning rate scheduler and a warmup ratio of 0.1. The training process resulted in a final validation loss of 1.1767.
Key Characteristics
- Base Model: Meta Llama-3.2-1B
- Parameter Count: 1 billion
- Context Length: 32768 tokens
- Fine-tuning Dataset: HuggingFaceH4/deita-10k-v0-sft
- Achieved Validation Loss: 1.1767
Intended Use Cases
Given its fine-tuning on an instruction-following dataset, this model is suitable for tasks requiring:
- General text generation
- Instruction-based prompting
- Exploration of smaller, fine-tuned Llama-3.2 variants