CorticalStack/gemma-7b-ultrachat-sft

TEXT GENERATIONConcurrency Cost:1Model Size:8.5BQuant:FP8Ctx Length:8kPublished:Feb 22, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

CorticalStack/gemma-7b-ultrachat-sft is an 8.5 billion parameter language model, fine-tuned from Google's Gemma-7B architecture. This model has undergone Supervised Fine-Tuning (SFT) using the UltraChat dataset, enhancing its conversational capabilities. It is specifically optimized for generating human-like responses in chat-based interactions, making it suitable for dialogue systems and conversational AI applications.

Loading preview...

Model Overview

CorticalStack/gemma-7b-ultrachat-sft is an 8.5 billion parameter language model derived from Google's Gemma-7B. It has been fine-tuned using Supervised Fine-Tuning (SFT) on the extensive UltraChat dataset, which is designed to improve conversational abilities.

Key Characteristics

  • Base Model: Fine-tuned from google/gemma-7b.
  • Training Data: Utilizes the stingning/ultrachat dataset for SFT, focusing on dialogue generation.
  • Parameter Count: 8.5 billion parameters.
  • Context Length: Supports a maximum sequence length of 2048 tokens during fine-tuning.

Fine-tuning Details

The model was fine-tuned using LoRA (Low-Rank Adaptation) with specific configurations:

  • LoRA r: 8
  • LoRA alpha: 16
  • LoRA dropout: 0.1

Training involved 1 epoch with a batch size of 4, gradient accumulation steps of 6, and a paged_adamw_32bit optimizer. A constant learning rate of 0.0002 was used over 100 steps.

Ideal Use Cases

This model is particularly well-suited for applications requiring robust conversational AI, such as:

  • Chatbots and virtual assistants.
  • Dialogue generation in interactive systems.
  • Tasks benefiting from human-like conversational responses.