rubenroy/Gilgamesh-72B
Gilgamesh-72B is a 72.7 billion parameter causal language model developed by Ruben Roy, fine-tuned from Alibaba's Qwen 2.5 72B Instruct. This model is specifically optimized for factual accuracy, mathematical capabilities, and reasoning, leveraging specialized datasets like GammaCorpus-CoT-Math-170k and GammaCorpus-Fact-QA-450k. It is designed for complex problem-solving and robust knowledge retrieval.
Loading preview...
Gilgamesh 72B Overview
Gilgamesh 72B, developed by Ruben Roy and funded by The Ovantage Society, is a 72.7 billion parameter causal language model. It is a finetune of Alibaba's Qwen 2.5 72B Instruct model, built upon a transformer architecture featuring RoPE, SwiGLU, RMSNorm, and Attention QKV bias. The model has 80 layers and 64 attention heads for Q with 8 for KV.
Key Capabilities
- Enhanced Factual Accuracy: Trained on the GammaCorpus-Fact-QA-450k dataset to improve knowledge retrieval and factual correctness.
- Advanced Mathematical Reasoning: Utilizes the GammaCorpus-CoT-Math-170k dataset, focusing on Chain-of-Thought (CoT) reasoning to improve step-by-step problem-solving in mathematics.
- Broad Knowledge Base: Incorporates the GammaCorpus-v2-5m dataset to ensure a wide range of general knowledge and conversational abilities.
Good for
- Applications requiring high factual precision.
- Tasks involving complex mathematical problem-solving and logical reasoning.
- General-purpose conversational AI where robust knowledge and reasoning are critical.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.