ramzanniaz331/llama3-8b-full-sft-v3
ramzanniaz331/llama3-8b-full-sft-v3 is an 8 billion parameter Llama 3.1-based language model fine-tuned by ramzanniaz331. This model is specialized through supervised fine-tuning on a diverse set of datasets including ramzan_5k_batch_1, ramzan_5k_batch_2, ramzan_openhermes, ramzan_metamath, and ramzan_aya_urdu, indicating a broad application focus. It is designed for general language understanding and generation tasks, leveraging its 32768 token context length for complex interactions.
Loading preview...
Overview
This model, ramzanniaz331/llama3-8b-full-sft-v3, is an 8 billion parameter language model developed by ramzanniaz331. It is a supervised fine-tuned (SFT) version of the ramzanniaz331/llama3.1-8b-8192-v3 base model, indicating an enhancement for specific conversational or instructional tasks. The fine-tuning process involved a diverse collection of datasets, suggesting a broad range of potential applications.
Key Training Details
The model was fine-tuned using a learning rate of 5e-06 over 2 epochs, with a total training batch size of 64 across 8 GPUs. It utilized the AdamW_Torch_Fused optimizer and a cosine learning rate scheduler with a 0.03 warmup ratio. The training was conducted using Transformers 4.57.1, Pytorch 2.9.1+cu128, Datasets 4.0.0, and Tokenizers 0.22.1.
Datasets Used for Fine-tuning
The fine-tuning process incorporated several distinct datasets:
ramzan_5k_batch_1ramzan_5k_batch_2ramzan_openhermesramzan_metamathramzan_aya_urdu
This diverse dataset selection implies an intent to improve the model's performance across various domains, potentially including general conversation, mathematical reasoning, and possibly multilingual capabilities (given the ramzan_aya_urdu dataset).