Model Overview
EVA-Qwen2.5-72B-v0.2 is a 72.7 billion parameter, full-parameter fine-tuned variant of the Qwen2.5 architecture, developed by Kearm, Auri, and Cahvay. This version (v0.2) features optimized training hyperparameters and an increased sequence length, leading to improved instruction following and reduced repetition in longer contexts. It is specifically designed as a specialist model for roleplay and story writing, aiming to provide enhanced versatility, creativity, and "flavor" in generated text.
Key Capabilities
- Roleplay and Storywriting Specialization: Fine-tuned on a diverse mixture of synthetic and natural data, including datasets like Celeste 70B 0.1, Kalomaze's Opus_Instruct_25k, and subsets from ChatGPT-4o-WritingPrompts and Sonnet3.5-Charcards-Roleplay.
- Extended Context Handling: Supports a context length of 131072 tokens, allowing for deeper and more consistent narrative generation.
- Improved Instruction Following: Version 0.2 specifically addresses and enhances the model's ability to follow instructions more effectively, particularly within extended contexts.
- Reduced Repetition: Training optimizations in v0.2 aim to minimize repetitive outputs, contributing to more natural and engaging text.
Training Details
The model was trained for 17 hours on 8xH100 SXM hardware. The training data mixture was significantly expanded from the Celeste 70B 0.1 base, incorporating various specialized datasets to improve its creative writing and roleplay capabilities. The prompt format used is ChatML.
Recommended Usage
For optimal performance, the developers recommend specific sampler values: Temperature: 0.8, Min-P: 0.05, Top-A: 0.3, and Repetition Penalty: 1.03. A dedicated SillyTavern preset is also provided for users of that platform.