phanviethoang1512/llama3.2-1b-deita-dpo-ref_teacher
The phanviethoang1512/llama3.2-1b-deita-dpo-ref_teacher model is an 8 billion parameter language model fine-tuned from Meta's Llama-3.1-8B architecture. It was trained on the HuggingFaceH4/deita-10k-v0-sft dataset, achieving a validation loss of 1.0297. This model is designed for general language generation tasks, leveraging its Llama-3.1 base for broad applicability.
Loading preview...
Model Overview
This model, named ref_teacher, is an 8 billion parameter language model derived from Meta's Llama-3.1-8B base architecture. It has been fine-tuned using the HuggingFaceH4/deita-10k-v0-sft dataset, aiming to enhance its performance on various language tasks.
Training Details
The training process involved a learning rate of 2e-05, a train_batch_size of 4, and a gradient_accumulation_steps of 4, resulting in an effective total batch size of 64. The model was trained for 3 epochs using the AdamW optimizer with a cosine learning rate scheduler and a warmup ratio of 0.1. During training, the validation loss decreased from 0.9567 in the first epoch to 1.0297 by the third epoch.
Key Characteristics
- Base Model: Meta Llama-3.1-8B
- Fine-tuning Dataset: HuggingFaceH4/deita-10k-v0-sft
- Parameter Count: 8 billion
- Context Length: 32768 tokens
Potential Use Cases
Given its foundation on Llama-3.1-8B and fine-tuning on a general instruction dataset, this model is suitable for a range of natural language processing applications, including text generation, summarization, and question answering, where a robust general-purpose language model is required.