vicgalle/CarbonBeagle-11B-truthy is a 10.7 billion parameter language model with a 4096 token context length. This model demonstrates strong performance across various benchmarks, including an average score of 76.10 on the Open LLM Leaderboard for general reasoning and language understanding tasks. It particularly excels in areas like HellaSwag and Winogrande, indicating robust common sense reasoning capabilities. Its evaluation on TruthfulQA suggests a focus on generating factually accurate responses.
Loading preview...
Model Overview
vicgalle/CarbonBeagle-11B-truthy is a 10.7 billion parameter language model designed to perform across a range of natural language understanding and reasoning tasks. With a context length of 4096 tokens, it processes substantial input for generating coherent and relevant outputs. The model's performance has been evaluated on the Open LLM Leaderboard, showcasing its capabilities in several key areas.
Key Capabilities & Performance
The model achieves an average score of 76.10 on the primary Open LLM Leaderboard evaluation. Specific benchmark results highlight its strengths:
- AI2 Reasoning Challenge (25-Shot): 72.27
- HellaSwag (10-Shot): 89.31
- MMLU (5-Shot): 66.55
- TruthfulQA (0-shot): 78.55
- Winogrande (5-shot): 83.82
- GSM8k (5-shot): 66.11
Further evaluations on a separate Open LLM Leaderboard set indicate an average of 21.29, with specific scores including:
- IFEval (0-Shot): 52.12
- BBH (3-Shot): 33.99
- MATH Lvl 5 (4-Shot): 4.76
- GPQA (0-shot): 6.60
- MuSR (0-shot): 4.11
- MMLU-PRO (5-shot): 26.19
Use Cases
Given its strong performance in common sense reasoning (HellaSwag, Winogrande) and factual accuracy (TruthfulQA), CarbonBeagle-11B-truthy is well-suited for applications requiring reliable information retrieval, question answering, and general text generation where factual correctness is important. Its MMLU and GSM8k scores suggest utility in academic and mathematical reasoning tasks, though more complex mathematical problems (MATH Lvl 5) show room for improvement.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.