Name: migtissera/Tess-v2.5-Qwen2-72B API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: migtissera

Tess-v2.5-Qwen2-72B: An Advanced Conversational LLM

migtissera's Tess-v2.5-Qwen2-72B is a 72.7 billion parameter large language model, fine-tuned from the Qwen2-72B base. This model is part of the Tess series, known for its focus on preserving the base model's entropy during fine-tuning. It demonstrates significant improvements across several key areas:

Key Capabilities & Performance

Superior Reasoning, Coding, and Mathematics: Tess-v2.5 shows enhanced capabilities in these critical domains.
Top-tier MMLU Performance: It is ranked as the #1 open-weight model on MMLU (Massive Multitask Language Understanding), surpassing models like Qwen2-72B-Instruct, Llama3-70B-Instruct, Mixtral-8x22B-Instruct, and DBRX-Instruct. Notably, it also outperforms closed models such as Gemini-1.0-Ultra, Gemini-1.5-Pro, Mistral-Large, and Claude-3-Sonnet on MMLU.
AGIEval Comparison: The model compares favorably with GPT-4-0314 on a subset of AGIEval (Nous).
Unique Conversational Feature: Tess-v2.5.2 (an updated version) is designed to ask follow-up questions for a more natural conversation flow, a feature that can be disabled via the system prompt.

Training & Dataset

Fine-tuned using the Tess-v2.5 dataset, comprising 300K synthetically generated samples covering diverse topics including business, management, marketing, history, social sciences, arts, STEM, and computer programming.
The dataset was created using the Sensei framework, leveraging frontier models like GPT-4-Turbo, Claude-Opus, and Mistral-Large.
The training process involved Axolotl, low learning rates, and high-quality, diverse data, fine-tuned on a 4xA100 VM for 4 days. It has not been aligned with RLHF or DPO.

Considerations

The model uses the ChatML prompt format.
It is an uncensored model, and users should exercise caution as it may occasionally produce inaccurate, inappropriate, or biased content.