gbueno86/Cathallama-70B
Cathallama-70B by gbueno86 is a 70 billion parameter instruction-tuned language model with an 8192 token context length, created by merging Meta-Llama-3.1-70B-Instruct, turboderp/Cat-Llama-3-70B-instruct, and Nexusflow/Athene-70B. This model demonstrates a 9% overall success rate increase on MMLU-PRO compared to LLaMA 3.1 70b, showing strong performance across various MMLU-PRO categories. It is designed for general conversational and reasoning tasks, particularly excelling in areas like Psychology, Economics, and Computer Science.
Loading preview...
Cathallama-70B: A Merged LLaMA 3.1 Variant
Cathallama-70B is a 70 billion parameter instruction-tuned model developed by gbueno86, built upon the LLaMA 3.1 architecture. It was created by merging three distinct models: Meta-Llama-3.1-70B-Instruct, turboderp/Cat-Llama-3-70B-instruct, and Nexusflow/Athene-70B, aiming to combine their strengths.
Key Performance Highlights
- MMLU-PRO Improvement: Achieves a 9% overall success rate increase on MMLU-PRO compared to the base LLaMA 3.1 70b model when tested at Q4_0 quantization.
- Category Strengths: Demonstrates particularly strong performance in MMLU-PRO categories such as Psychology (85%), Economics (80%), and Computer Science (60%).
- Manual Testing: Showed robust performance in manual tests covering common sense, programming (e.g., JSON, Python snake game), and math tasks.
Creation Workflow
The model's development involved a multi-step merging process:
- Nexusflow_Athene was merged with Meta-Llama-3.1.
- turboderp_Cat was merged with Meta-Llama-3.1.
- The results of these two merges were then combined to form Cathallama.
Use Cases
Cathallama-70B is suitable for a variety of general-purpose conversational and reasoning applications, especially where improved performance on complex academic and common-sense reasoning tasks is beneficial. Its strong MMLU-PRO scores suggest utility in educational, research, and analytical contexts.