Model Overview
Yaseal/llama3_1b_instruct_vallina_full_sft_30k is a 1 billion parameter instruction-tuned model, derived from the LLM-Research/Llama-3.2-1B-Instruct base architecture. This model has been specifically fine-tuned on the deepmath_plain_30k_train dataset, indicating a specialization towards tasks related to the content of this dataset.
Key Characteristics
- Base Model: LLM-Research/Llama-3.2-1B-Instruct
- Parameter Count: 1 billion parameters
- Context Length: 32768 tokens
- Training Data: Fine-tuned on
deepmath_plain_30k_train - Performance: Achieved a validation loss of 0.5760 during training.
Training Details
The model was trained with a learning rate of 2e-05, using an AdamW optimizer and a cosine learning rate scheduler with a 0.1 warmup ratio. The training involved 2 epochs with a total batch size of 16 across 2 GPUs. This fine-tuning process aims to adapt the base Llama-3.2-1B-Instruct model to specific instruction-following capabilities as defined by the deepmath_plain_30k_train dataset.
Potential Use Cases
Given its fine-tuning on a specific dataset, this model is likely best suited for:
- Applications requiring instruction-following capabilities aligned with the
deepmath_plain_30k_train dataset's domain. - Scenarios where a compact, 1B parameter model is preferred for efficiency while still offering specialized instruction-tuned performance.