migtissera/Tess-v2.5.2-Qwen2-72B

TEXT GENERATIONConcurrency Cost:4Model Size:72.7BQuant:FP8Ctx Length:32kPublished:Jun 13, 2024License:qwen2Architecture:Transformer0.0K Cold

Tess-v2.5.2-Qwen2-72B is a 72 billion parameter large language model developed by Migel Tissera, fine-tuned on the Qwen2-72B base. This model demonstrates significant improvements in reasoning, coding, and mathematics, achieving the #1 rank among open-weight models on MMLU evaluations. It is designed to provide detailed answers and natural conversation, including intentional follow-up questions, and is suitable for complex analytical and generative tasks.

Loading preview...

Tess-v2.5.2-Qwen2-72B Overview

Tess-v2.5.2 is the latest iteration in the Tess series of Large Language Models, developed by Migel Tissera. This model is a fine-tune over the Qwen2-72B base, utilizing a subset of the Tess-v2.5 dataset, which comprises 300K synthetically generated samples covering diverse topics like business, STEM, and computer programming. The dataset was created using frontier models such as GPT-4-Turbo, Claude-Opus, and Mistral-Large.

Key Capabilities & Differentiators

  • Exceptional Performance: Ranks #1 among open-weight models on MMLU, outperforming Qwen2-72B-Instruct, Llama3-70B-Instruct, Mixtral-8x22B-Instruct, and DBRX-Instruct. It also surpasses frontier closed models like Gemini-1.0-Ultra, Gemini-1.5-Pro, Mistral-Large, and Claude-3-Sonnet on MMLU.
  • Enhanced Reasoning: Shows significant improvements in general reasoning, coding, and mathematical capabilities.
  • Natural Conversation Flow: Intentionally designed to ask follow-up questions for a more natural conversational experience, with proper stop token generation ensuring controlled responses.
  • Training Methodology: Fine-tuned with low learning rates and a low number of epochs on high-quality, diverse data, preserving the entropy of the base model.

Use Cases & Considerations

This model is well-suited for applications requiring advanced reasoning, detailed responses, and complex problem-solving in areas like coding and mathematics. It uses the ChatML prompt format. Users should be aware that it is an uncensored model and, while aiming for accuracy, can occasionally produce inaccurate or biased content. The model has not been aligned with RLHF or DPO.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p