phanviethoang1512/llama3.2-1b-deita-dpo-student_sft_init

TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Mar 31, 2026License:llama3.2Architecture:Transformer Cold

The phanviethoang1512/llama3.2-1b-deita-dpo-student_sft_init is a 1 billion parameter causal language model, fine-tuned from Meta's Llama-3.2-1B. It was trained on the HuggingFaceH4/deita-10k-v0-sft dataset, demonstrating a validation loss of 1.1767. This model is designed for general language generation tasks, building upon its Llama-3.2 base with further instruction-following capabilities.

Loading preview...

Model Overview

This model, phanviethoang1512/llama3.2-1b-deita-dpo-student_sft_init, is a 1 billion parameter language model derived from Meta's Llama-3.2-1B architecture. It has been fine-tuned using the HuggingFaceH4/deita-10k-v0-sft dataset, aiming to enhance its instruction-following and general language generation capabilities.

Training Details

The model underwent 3 epochs of training with a learning rate of 2e-05 and a total batch size of 64 (achieved with a train_batch_size of 4 and gradient_accumulation_steps of 4). The optimizer used was AdamW with a cosine learning rate scheduler and a warmup ratio of 0.1. The training process resulted in a final validation loss of 1.1767.

Key Characteristics

  • Base Model: Meta Llama-3.2-1B
  • Parameter Count: 1 billion
  • Context Length: 32768 tokens
  • Fine-tuning Dataset: HuggingFaceH4/deita-10k-v0-sft
  • Achieved Validation Loss: 1.1767

Intended Use Cases

Given its fine-tuning on an instruction-following dataset, this model is suitable for tasks requiring:

  • General text generation
  • Instruction-based prompting
  • Exploration of smaller, fine-tuned Llama-3.2 variants