itsliupeng/llama2_70b_mmlu
The itsliupeng/llama2_70b_mmlu model is a 69 billion parameter Llama-2-70b-hf variant continuously trained by itsliupeng using the mmlu_recall dataset. It is specifically optimized to enhance performance on MMLU metrics while maintaining other metric performances. This model is designed for tasks requiring strong multi-task language understanding and reasoning capabilities, with a context length of 32768 tokens.
Loading preview...
Overview
The itsliupeng/llama2_70b_mmlu model is a 69 billion parameter language model derived from the Llama-2-70b-hf architecture. Developed by itsliupeng, this model undergoes continuous training using the mmlu_recall dataset. The primary objective of this training regimen is to significantly improve its performance on Multi-task Language Understanding (MMLU) benchmarks, without negatively impacting its capabilities across other evaluation metrics.
Key Capabilities & Performance
This model demonstrates strong performance across various benchmarks, as indicated by its evaluation on the Open LLM Leaderboard. Its specific optimization for MMLU tasks makes it particularly adept at complex reasoning and understanding.
- Average Score: Achieves an average score of 68.24 on the Open LLM Leaderboard.
- MMLU (5-Shot): Scores 71.89, highlighting its enhanced multi-task language understanding.
- AI2 Reasoning Challenge (25-Shot): Performs well with a score of 65.61.
- HellaSwag (10-Shot): Demonstrates strong common-sense reasoning with 87.37.
- Winogrande (5-shot): Achieves 82.40, indicating proficiency in resolving pronoun ambiguity.
- GSM8k (5-shot): Scores 52.99 on mathematical reasoning tasks.
When to Use This Model
This model is particularly well-suited for applications requiring robust language understanding and reasoning, especially where MMLU performance is a critical factor. Its continuous training on the mmlu_recall dataset makes it a strong candidate for tasks that benefit from improved accuracy in diverse knowledge domains and problem-solving scenarios.