Yaseal/llama3_1b_instruct_vallina_full_sft_30k

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Mar 21, 2026License:otherArchitecture:Transformer Warm

Yaseal/llama3_1b_instruct_vallina_full_sft_30k is a 1 billion parameter instruction-tuned language model, fine-tuned by Yaseal from the LLM-Research/Llama-3.2-1B-Instruct base model. It was trained on the deepmath_plain_30k_train dataset, achieving a validation loss of 0.5760. This model is optimized for tasks related to its training data, offering a compact solution for specific instruction-following applications.

Loading preview...

Model Overview

Yaseal/llama3_1b_instruct_vallina_full_sft_30k is a 1 billion parameter instruction-tuned model, derived from the LLM-Research/Llama-3.2-1B-Instruct base architecture. This model has been specifically fine-tuned on the deepmath_plain_30k_train dataset, indicating a specialization towards tasks related to the content of this dataset.

Key Characteristics

  • Base Model: LLM-Research/Llama-3.2-1B-Instruct
  • Parameter Count: 1 billion parameters
  • Context Length: 32768 tokens
  • Training Data: Fine-tuned on deepmath_plain_30k_train
  • Performance: Achieved a validation loss of 0.5760 during training.

Training Details

The model was trained with a learning rate of 2e-05, using an AdamW optimizer and a cosine learning rate scheduler with a 0.1 warmup ratio. The training involved 2 epochs with a total batch size of 16 across 2 GPUs. This fine-tuning process aims to adapt the base Llama-3.2-1B-Instruct model to specific instruction-following capabilities as defined by the deepmath_plain_30k_train dataset.

Potential Use Cases

Given its fine-tuning on a specific dataset, this model is likely best suited for:

  • Applications requiring instruction-following capabilities aligned with the deepmath_plain_30k_train dataset's domain.
  • Scenarios where a compact, 1B parameter model is preferred for efficiency while still offering specialized instruction-tuned performance.