migtissera/Tess-v2.5-Qwen2-72B

Warm
Public
72.7B
FP8
32768
4
Jun 12, 2024
License: qwen2
Hugging Face
Overview

Tess-v2.5-Qwen2-72B: An Advanced Conversational LLM

migtissera's Tess-v2.5-Qwen2-72B is a 72.7 billion parameter large language model, fine-tuned from the Qwen2-72B base. This model is part of the Tess series, known for its focus on preserving the base model's entropy during fine-tuning. It demonstrates significant improvements across several key areas:

Key Capabilities & Performance

  • Superior Reasoning, Coding, and Mathematics: Tess-v2.5 shows enhanced capabilities in these critical domains.
  • Top-tier MMLU Performance: It is ranked as the #1 open-weight model on MMLU (Massive Multitask Language Understanding), surpassing models like Qwen2-72B-Instruct, Llama3-70B-Instruct, Mixtral-8x22B-Instruct, and DBRX-Instruct. Notably, it also outperforms closed models such as Gemini-1.0-Ultra, Gemini-1.5-Pro, Mistral-Large, and Claude-3-Sonnet on MMLU.
  • AGIEval Comparison: The model compares favorably with GPT-4-0314 on a subset of AGIEval (Nous).
  • Unique Conversational Feature: Tess-v2.5.2 (an updated version) is designed to ask follow-up questions for a more natural conversation flow, a feature that can be disabled via the system prompt.

Training & Dataset

  • Fine-tuned using the Tess-v2.5 dataset, comprising 300K synthetically generated samples covering diverse topics including business, management, marketing, history, social sciences, arts, STEM, and computer programming.
  • The dataset was created using the Sensei framework, leveraging frontier models like GPT-4-Turbo, Claude-Opus, and Mistral-Large.
  • The training process involved Axolotl, low learning rates, and high-quality, diverse data, fine-tuned on a 4xA100 VM for 4 days. It has not been aligned with RLHF or DPO.

Considerations

  • The model uses the ChatML prompt format.
  • It is an uncensored model, and users should exercise caution as it may occasionally produce inaccurate, inappropriate, or biased content.