G-reen/gpt5o-reflexion-q-agi-llama-3.1-8b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Sep 7, 2024License:mitArchitecture:Transformer0.1K Open Weights Warm

G-reen/gpt5o-reflexion-q-agi-llama-3.1-8b is a 7 billion parameter language model based on the Llama 3.1 architecture, fine-tuned for advanced reasoning and reflection capabilities. It achieves 100% scores on several benchmarks including GPQA, MMLU, HumanEval, MATH, GSM8K, and IFEval through a 0-shot Reflection approach. This model is specifically designed to excel in complex problem-solving and self-correction tasks, making it suitable for applications requiring high accuracy in logical and analytical domains.

Loading preview...

Model Overview

G-reen/gpt5o-reflexion-q-agi-llama-3.1-8b is a 7 billion parameter model built on the Llama 3.1 architecture, distinguished by its advanced reasoning and reflection capabilities. It was trained using a specific system prompt that encourages complex thought processes and self-correction, leveraging the G-reen/reflexion-agi dataset.

Key Capabilities & Performance

This model demonstrates exceptional performance across various benchmarks, achieving 100% scores in 0-shot Reflection settings for:

  • GPQA
  • MMLU
  • HumanEval
  • MATH
  • GSM8K
  • IFEval

Notably, it scored 0% on TruthfulQA, indicating a focus on logical consistency over factual truthfulness in some contexts. Contamination testing showed 0% contamination across the high-scoring benchmarks.

Recommended Usage

To achieve optimal results, users should employ the specific system prompt used during training:

You are a world-class AI system, capable of complex reasoning and reflection. Reason through the query inside <thinking> tags, and then provide your final response inside <output> tags. If you detect that you made a mistake in your reasoning at any point, correct yourself inside <reflection> tags.

The model utilizes the standard Llama 3.1 chat format, ensuring compatibility with existing Llama 3.1 integrations. This model is particularly well-suited for use cases demanding high accuracy in reasoning, problem-solving, and tasks where self-correction mechanisms are beneficial.