EVA-Qwen2.5-14B-v0.1 Overview
EVA-Qwen2.5-14B-v0.1 is a 14.8 billion parameter, full-parameter fine-tuned model based on the Qwen2.5 architecture, developed by Kearm and Auri. This iteration, version 0.1, features a deduped and cleaned dataset compared to its predecessor, alongside an increased sequence length, resulting in improved stability and handling of short inputs and min_p sampling.
Key Capabilities & Features
- Roleplay and Storywriting Specialization: Fine-tuned specifically for generating creative and versatile narrative content.
- Expanded Data Mixture: Utilizes an enhanced version of the Celeste 70B 0.1 data mixture, supplemented with datasets like Kalomaze's Opus_Instruct_25k, ChatGPT-4o-WritingPrompts, Sonnet3.5-Charcards-Roleplay, shortstories_synthlabels, Synthstruct, and SynthRP.
- High Context Length: Supports a substantial context window of 131072 tokens.
- ChatML Prompt Format: Designed to work with the ChatML prompting standard.
- Optimized for Creativity: The training data and fine-tuning process aim to enhance the model's creative output and narrative 'flavor'.
Training Details
The model was trained over 3 days using 4xA6000 GPUs. It's important to note that using a quantized KV cache with Qwen2.5 is not recommended due to potential output quality degradation, though the f16 KV cache is considered light enough.
Recommended Usage
For optimal performance, the developers recommend specific sampler values:
- Temperature: 1
- Typical-P: 0.9
- Min-P: 0.05
- Top-A: 0.2
- Repetition Penalty: 1.03