Tess-R1 Limerick (Llama-3.1-70B) Overview

Tess-R1 Limerick is a 70 billion parameter model from the Tess-Reasoning-1 series, built on the Llama 3.1 architecture. Its core innovation lies in its test-time compute capabilities, designed to perform detailed reasoning before generating a final response. This model is trained to engage in a multi-stage thought process, making it particularly adept at complex problem-solving.

Key Capabilities & Features

Chain-of-Thought (CoT) Reasoning: The model first thinks step-by-step, indicated by <thinking> tags, to devise a reasoning path.
Contemplation: It can contemplate its answers using <contemplation> tags, allowing for self-correction and refinement.
Alternative Generation: The model can suggest alternative solutions or approaches, marked by <alternatively> tags.
Structured Output: The final answer is encapsulated within <output> tags, ensuring clear separation from the reasoning process.
Llama3 Prompt Format: Utilizes the standard Llama3 prompt structure for input.
Specific System Message: Requires a precise system message to activate its full reasoning capabilities.

Performance Highlights

Evaluations, performed by extracting content within <output> tags, show competitive performance:

GPQA: 41.5%
MMLU: 81.6%
MATH: 64.2%
MMLU-Pro: 65.6%
HumanEval: 61.0%

When to Use This Model

This model is ideal for use cases demanding transparent and verifiable reasoning, where the process of arriving at an answer is as important as the answer itself. It's particularly suited for applications requiring:

Complex problem-solving where step-by-step logic is beneficial.
Decision support systems that need to show their thought process.
Educational tools that can demonstrate reasoning.
Tasks requiring deliberation and alternative suggestions.

Users should be aware that in multi-turn conversations, only the content within the <output> tags should be carried forward to maintain coherence and prevent out-of-distribution inputs. The model can be prompted with a <thinking> tag to ensure it utilizes its full XML-tagged reasoning capabilities.