Overview
Tess-v2.5-Qwen2-72B: An Advanced Conversational LLM
migtissera's Tess-v2.5-Qwen2-72B is a 72.7 billion parameter large language model, fine-tuned from the Qwen2-72B base. This model is part of the Tess series, known for its focus on preserving the base model's entropy during fine-tuning. It demonstrates significant improvements across several key areas:
Key Capabilities & Performance
- Superior Reasoning, Coding, and Mathematics: Tess-v2.5 shows enhanced capabilities in these critical domains.
- Top-tier MMLU Performance: It is ranked as the #1 open-weight model on MMLU (Massive Multitask Language Understanding), surpassing models like Qwen2-72B-Instruct, Llama3-70B-Instruct, Mixtral-8x22B-Instruct, and DBRX-Instruct. Notably, it also outperforms closed models such as Gemini-1.0-Ultra, Gemini-1.5-Pro, Mistral-Large, and Claude-3-Sonnet on MMLU.
- AGIEval Comparison: The model compares favorably with GPT-4-0314 on a subset of AGIEval (Nous).
- Unique Conversational Feature: Tess-v2.5.2 (an updated version) is designed to ask follow-up questions for a more natural conversation flow, a feature that can be disabled via the system prompt.
Training & Dataset
- Fine-tuned using the Tess-v2.5 dataset, comprising 300K synthetically generated samples covering diverse topics including business, management, marketing, history, social sciences, arts, STEM, and computer programming.
- The dataset was created using the Sensei framework, leveraging frontier models like GPT-4-Turbo, Claude-Opus, and Mistral-Large.
- The training process involved Axolotl, low learning rates, and high-quality, diverse data, fine-tuned on a 4xA100 VM for 4 days. It has not been aligned with RLHF or DPO.
Considerations
- The model uses the ChatML prompt format.
- It is an uncensored model, and users should exercise caution as it may occasionally produce inaccurate, inappropriate, or biased content.