gbueno86/Meta-LLama-3-Cat-Smaug-LLama-70b

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:8kPublished:May 24, 2024License:llama3Architecture:Transformer0.0K Warm

The gbueno86/Meta-LLama-3-Cat-Smaug-LLama-70b is a 70 billion parameter language model, a merge of Meta-LLama-3-Cat-A-LLama-70b and abacusai_Smaug-Llama-3-70B-Instruct using the SLERP method. This model is designed for general language understanding and generation tasks, demonstrating capabilities in logical reasoning, problem-solving, and creative text generation, as evidenced by its performance on various benchmarks.

Loading preview...

Model Overview

This model, gbueno86/Meta-LLama-3-Cat-Smaug-LLama-70b, is a 70 billion parameter language model created by merging two pre-trained models: Meta-LLama-3-Cat-A-LLama-70b and abacusai_Smaug-Llama-3-70B-Instruct. The merge was performed using the SLERP (Spherical Linear Interpolation) method, combining their strengths to enhance overall performance. It supports a context length of 8192 tokens.

Key Capabilities

  • Logical Reasoning: Demonstrates step-by-step reasoning for complex problems, such as the "ball in the microwave" and "killers in a room" scenarios.
  • Problem Solving: Capable of breaking down multi-step problems, like calculating ways to open doors and windows for airflow.
  • Code Generation: Can generate functional code, as shown by the Pygame "Snake" game example.
  • Creative Text Generation: Able to produce creative content, including poems and horror stories, while following specific thematic instructions.
  • Instruction Following: Accurately processes and responds to diverse user prompts, including JSON generation requests.

Performance Highlights

Evaluations on the Open LLM Leaderboard indicate a strong average performance of 38.27. Notable scores include:

  • IFEval (0-Shot): 80.72
  • BBH (3-Shot): 51.51
  • MMLU-PRO (5-shot): 45.28

Good For

This model is suitable for applications requiring robust general-purpose language understanding, logical deduction, and creative content generation. Its merged architecture aims to leverage the strengths of its constituent models, making it a versatile choice for tasks ranging from complex reasoning to interactive conversational agents.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p