CorticalStack/gemma-7b-ultrachat-sft
CorticalStack/gemma-7b-ultrachat-sft is an 8.5 billion parameter language model, fine-tuned from Google's Gemma-7B architecture. This model has undergone Supervised Fine-Tuning (SFT) using the UltraChat dataset, enhancing its conversational capabilities. It is specifically optimized for generating human-like responses in chat-based interactions, making it suitable for dialogue systems and conversational AI applications.
Loading preview...
Model Overview
CorticalStack/gemma-7b-ultrachat-sft is an 8.5 billion parameter language model derived from Google's Gemma-7B. It has been fine-tuned using Supervised Fine-Tuning (SFT) on the extensive UltraChat dataset, which is designed to improve conversational abilities.
Key Characteristics
- Base Model: Fine-tuned from
google/gemma-7b. - Training Data: Utilizes the
stingning/ultrachatdataset for SFT, focusing on dialogue generation. - Parameter Count: 8.5 billion parameters.
- Context Length: Supports a maximum sequence length of 2048 tokens during fine-tuning.
Fine-tuning Details
The model was fine-tuned using LoRA (Low-Rank Adaptation) with specific configurations:
- LoRA r: 8
- LoRA alpha: 16
- LoRA dropout: 0.1
Training involved 1 epoch with a batch size of 4, gradient accumulation steps of 6, and a paged_adamw_32bit optimizer. A constant learning rate of 0.0002 was used over 100 steps.
Ideal Use Cases
This model is particularly well-suited for applications requiring robust conversational AI, such as:
- Chatbots and virtual assistants.
- Dialogue generation in interactive systems.
- Tasks benefiting from human-like conversational responses.