AmberYifan/llama3-8b-full-pretrain-mix-low-tweet-1m-en-sft
The AmberYifan/llama3-8b-full-pretrain-mix-low-tweet-1m-en-sft model is an 8 billion parameter language model, fine-tuned from AmberYifan/llama3-8b-full-pretrain-mix-low-tweet-1m-en. This model is specifically instruction-tuned on the alpaca_en dataset, making it suitable for general-purpose conversational AI and instruction-following tasks. It leverages the Llama 3 architecture and has a context length of 8192 tokens.
Loading preview...
Overview
This model, llama3-8b-full-pretrain-mix-low-tweet-1m-en-sft, is an 8 billion parameter language model developed by AmberYifan. It is a fine-tuned variant of the AmberYifan/llama3-8b-full-pretrain-mix-low-tweet-1m-en base model, specifically optimized through supervised fine-tuning (SFT) on the alpaca_en dataset. This instruction-tuning process enhances its ability to follow commands and generate coherent, relevant responses based on given prompts.
Key Training Details
- Base Model: AmberYifan/llama3-8b-full-pretrain-mix-low-tweet-1m-en
- Fine-tuning Dataset:
alpaca_en - Learning Rate: 1e-05
- Optimizer: AdamW with cosine learning rate scheduler
- Epochs: 3.0
- Total Batch Size: 128 (across 8 GPUs with gradient accumulation)
Intended Use Cases
This model is primarily intended for applications requiring instruction-following capabilities in English. Its fine-tuning on the Alpaca dataset suggests suitability for:
- General-purpose chatbots
- Question answering
- Text generation based on specific instructions
- Prototyping and development of conversational AI systems