mookiezi/Discord-Micae-Hermes-3-3B

TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kLicense:llama3Architecture:Transformer0.0K Cold

Discord-Micae-Hermes-3-3B by mookiezi is a 3.2 billion parameter language model fine-tuned from NousResearch/Hermes-3-Llama-3.2-3B with a 32768 token context length. This model is specifically optimized for generating casual, human-like dialogue, trained on Discord conversation data. It excels at creating natural text for chatbots and video game dialogue, focusing on synthetic dialogue aligned with human expression.

Loading preview...

Overview

Discord-Micae-Hermes-3-3B is a 3.2 billion parameter language model developed by mookiezi, fine-tuned from the NousResearch/Hermes-3-Llama-3.2-3B base model. Its primary focus is on generating casual, human-like dialogue by leveraging a specialized dataset of Discord conversations. The model was trained over 17 days on a GTX 1080, utilizing a LoRA merge fine-tuning method across multiple epochs with varying training schedules for single-turn and multi-turn exchanges.

Key Capabilities

  • Generates dialogue with a casual, human-like tone.
  • Supports experimentation with dialogue agents trained on Discord data.
  • Functions as a base model for natural text generation in video game text-dialogue.
  • Utilizes the ChatML prompt format, handling context and chat history effectively.

Limitations and Considerations

  • Inherits potential biases from Discord-style language.
  • Not safety-aligned for deployment without moderation.
  • Not intended for factual or sensitive information retrieval, despite inheriting knowledge from its base model.

Training Details

The model was fine-tuned using the mookiezi/Discord-OpenMicae dataset. The training involved a multi-phase schedule, including 17M tokens of single-turn exchanges and 5.5M tokens of multi-turn chains, followed by a combined dataset epoch. It uses torch.optim.AdamW and a Cosine scheduler with warmup steps.