Model Overview
vicgalle/CarbonBeagle-11B-truthy is a 10.7 billion parameter language model designed to perform across a range of natural language understanding and reasoning tasks. With a context length of 4096 tokens, it processes substantial input for generating coherent and relevant outputs. The model's performance has been evaluated on the Open LLM Leaderboard, showcasing its capabilities in several key areas.
Key Capabilities & Performance
The model achieves an average score of 76.10 on the primary Open LLM Leaderboard evaluation. Specific benchmark results highlight its strengths:
- AI2 Reasoning Challenge (25-Shot): 72.27
- HellaSwag (10-Shot): 89.31
- MMLU (5-Shot): 66.55
- TruthfulQA (0-shot): 78.55
- Winogrande (5-shot): 83.82
- GSM8k (5-shot): 66.11
Further evaluations on a separate Open LLM Leaderboard set indicate an average of 21.29, with specific scores including:
- IFEval (0-Shot): 52.12
- BBH (3-Shot): 33.99
- MATH Lvl 5 (4-Shot): 4.76
- GPQA (0-shot): 6.60
- MuSR (0-shot): 4.11
- MMLU-PRO (5-shot): 26.19
Use Cases
Given its strong performance in common sense reasoning (HellaSwag, Winogrande) and factual accuracy (TruthfulQA), CarbonBeagle-11B-truthy is well-suited for applications requiring reliable information retrieval, question answering, and general text generation where factual correctness is important. Its MMLU and GSM8k scores suggest utility in academic and mathematical reasoning tasks, though more complex mathematical problems (MATH Lvl 5) show room for improvement.