UWNSL/Llama3.1-3B-Instruct_Mix-Long Overview
This model is a fine-tuned variant of the Meta Llama-3.2-3B-Instruct architecture, developed by UWNSL. It is a 3.2 billion parameter instruction-following language model, distinguished by its 32768-token context window, which allows for processing significantly longer inputs compared to many base models.
Key Characteristics
- Base Model: Fine-tuned from
meta-llama/Llama-3.2-3B-Instruct. - Parameter Count: 3.2 billion parameters.
- Context Length: Supports an extended context of 32768 tokens.
- Training Data: Fine-tuned on the
Mix-Long_long_0.2_short_0.8 dataset. - Training Loss: Achieved a loss of 0.2421 on the evaluation set during training.
Intended Use Cases
Given its instruction-tuned nature and extended context window, this model is well-suited for applications requiring:
- General instruction following: Responding to a wide range of prompts and commands.
- Long-form text processing: Summarization, analysis, or generation of documents, articles, or code with substantial length.
- Conversational AI: Maintaining context over extended dialogues.
Training Details
The model was trained using a learning rate of 1e-05, a total batch size of 80 (with gradient accumulation), and the AdamW optimizer over 2 epochs. The training utilized Transformers 4.46.1 and Pytorch 2.6.0+cu124.