Overview
ale-bay/zephyr-2b-gemma-sft is a 2.6 billion parameter language model derived from Google's Gemma-2B architecture. It has been instruction-tuned on the HuggingFaceH4/deita-10k-v0-sft dataset, aiming to enhance its performance on various instruction-following tasks. The model was trained over 3 epochs, achieving a final validation loss of 1.0529.
Key Capabilities
- Instruction Following: Fine-tuned on a supervised instruction dataset to improve response generation based on given prompts.
- General Language Generation: Suitable for a range of text generation tasks due to its foundational Gemma architecture and instruction tuning.
Training Details
The model utilized a learning rate of 2e-05, a total batch size of 128, and an Adam optimizer. Training was conducted across 8 GPUs with a cosine learning rate scheduler and a warmup ratio of 0.1.
Intended Use Cases
- Prototyping: Can serve as a base for further fine-tuning on more specific datasets.
- Research: Useful for exploring the effects of instruction tuning on Gemma-2B with the
deita-10k-v0-sft dataset.