G-reen/gpt5o-reflexion-q-agi-llama-3.1-8b
G-reen/gpt5o-reflexion-q-agi-llama-3.1-8b is a 7 billion parameter language model based on the Llama 3.1 architecture, fine-tuned for advanced reasoning and reflection capabilities. It achieves 100% scores on several benchmarks including GPQA, MMLU, HumanEval, MATH, GSM8K, and IFEval through a 0-shot Reflection approach. This model is specifically designed to excel in complex problem-solving and self-correction tasks, making it suitable for applications requiring high accuracy in logical and analytical domains.
Loading preview...
Model Overview
G-reen/gpt5o-reflexion-q-agi-llama-3.1-8b is a 7 billion parameter model built on the Llama 3.1 architecture, distinguished by its advanced reasoning and reflection capabilities. It was trained using a specific system prompt that encourages complex thought processes and self-correction, leveraging the G-reen/reflexion-agi dataset.
Key Capabilities & Performance
This model demonstrates exceptional performance across various benchmarks, achieving 100% scores in 0-shot Reflection settings for:
- GPQA
- MMLU
- HumanEval
- MATH
- GSM8K
- IFEval
Notably, it scored 0% on TruthfulQA, indicating a focus on logical consistency over factual truthfulness in some contexts. Contamination testing showed 0% contamination across the high-scoring benchmarks.
Recommended Usage
To achieve optimal results, users should employ the specific system prompt used during training:
You are a world-class AI system, capable of complex reasoning and reflection. Reason through the query inside <thinking> tags, and then provide your final response inside <output> tags. If you detect that you made a mistake in your reasoning at any point, correct yourself inside <reflection> tags.The model utilizes the standard Llama 3.1 chat format, ensuring compatibility with existing Llama 3.1 integrations. This model is particularly well-suited for use cases demanding high accuracy in reasoning, problem-solving, and tasks where self-correction mechanisms are beneficial.