grimjim/SauerHuatuoSkywork-o1-Llama-3.1-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jan 27, 2025License:llama3.1Architecture:Transformer0.0K Cold

SauerHuatuoSkywork-o1-Llama-3.1-8B by grimjim is an 8 billion parameter language model based on the Llama 3.1 architecture, featuring a 32768 token context length. This model is a merge of HuatuoSkywork-o1-Llama-3.1-8B and Llama-3.1-SauerkrautLM-8b-Instruct, designed to hybridize high-scoring Llama 3.1 performance with o1 reasoning capabilities. It aims to improve overall benchmark scores, particularly in areas beyond IFEval, by integrating reasoning strengths.

Loading preview...

Model Overview

The grimjim/SauerHuatuoSkywork-o1-Llama-3.1-8B is an 8 billion parameter language model built upon the Llama 3.1 architecture, featuring a 32768 token context length. It was created using the mergekit tool, specifically employing the SLERP merge method to combine two distinct Llama 3.1 8B models: grimjim/HuatuoSkywork-o1-Llama-3.1-8B and VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct.

Key Characteristics

  • Hybridized Reasoning: This model is an experimental merge designed to integrate the reasoning capabilities of the "o1" model with the strong performance of the "SauerkrautLM" model.
  • Benchmark Improvements: While IFEval scores were noted to be lower than the SauerkrautLM component, the merge generally improved other benchmark results, indicating a broader enhancement in capabilities.
  • Llama 3.1 Base: Leverages the foundational strengths of the Llama 3.1 series.

Performance Benchmarks

Evaluated on the Open LLM Leaderboard, the model achieved an Average score of 26.63%. Specific benchmark results include:

  • IFEval (0-Shot): 52.19%
  • BBH (3-Shot): 32.09%
  • MMLU-PRO (5-shot): 33.23%

Use Cases

This model is suitable for applications requiring a balance of general language understanding and improved reasoning, particularly where the combined strengths of its merged components are beneficial. Its 32K context window supports processing longer inputs.