Tess-v2.5.2-Qwen2-72B Overview

Tess-v2.5.2 is the latest iteration in the Tess series of Large Language Models, developed by Migel Tissera. This model is a fine-tune over the Qwen2-72B base, utilizing a subset of the Tess-v2.5 dataset, which comprises 300K synthetically generated samples covering diverse topics like business, STEM, and computer programming. The dataset was created using frontier models such as GPT-4-Turbo, Claude-Opus, and Mistral-Large.

Key Capabilities & Differentiators

Exceptional Performance: Ranks #1 among open-weight models on MMLU, outperforming Qwen2-72B-Instruct, Llama3-70B-Instruct, Mixtral-8x22B-Instruct, and DBRX-Instruct. It also surpasses frontier closed models like Gemini-1.0-Ultra, Gemini-1.5-Pro, Mistral-Large, and Claude-3-Sonnet on MMLU.
Enhanced Reasoning: Shows significant improvements in general reasoning, coding, and mathematical capabilities.
Natural Conversation Flow: Intentionally designed to ask follow-up questions for a more natural conversational experience, with proper stop token generation ensuring controlled responses.
Training Methodology: Fine-tuned with low learning rates and a low number of epochs on high-quality, diverse data, preserving the entropy of the base model.

Use Cases & Considerations

This model is well-suited for applications requiring advanced reasoning, detailed responses, and complex problem-solving in areas like coding and mathematics. It uses the ChatML prompt format. Users should be aware that it is an uncensored model and, while aiming for accuracy, can occasionally produce inaccurate or biased content. The model has not been aligned with RLHF or DPO.

Overview

Tess-v2.5.2-Qwen2-72B Overview

Key Capabilities & Differentiators

Use Cases & Considerations

Full Model Card (README)