rubenroy/Gilgamesh-72B

TEXT GENERATIONConcurrency Cost:4Model Size:72.7BQuant:FP8Ctx Length:32kPublished:Jun 4, 2025License:otherArchitecture:Transformer0.0K Cold

Gilgamesh-72B is a 72.7 billion parameter causal language model developed by Ruben Roy, fine-tuned from Alibaba's Qwen 2.5 72B Instruct. This model is specifically optimized for factual accuracy, mathematical capabilities, and reasoning, leveraging specialized datasets like GammaCorpus-CoT-Math-170k and GammaCorpus-Fact-QA-450k. It is designed for complex problem-solving and robust knowledge retrieval.

Loading preview...

Gilgamesh 72B Overview

Gilgamesh 72B, developed by Ruben Roy and funded by The Ovantage Society, is a 72.7 billion parameter causal language model. It is a finetune of Alibaba's Qwen 2.5 72B Instruct model, built upon a transformer architecture featuring RoPE, SwiGLU, RMSNorm, and Attention QKV bias. The model has 80 layers and 64 attention heads for Q with 8 for KV.

Key Capabilities

  • Enhanced Factual Accuracy: Trained on the GammaCorpus-Fact-QA-450k dataset to improve knowledge retrieval and factual correctness.
  • Advanced Mathematical Reasoning: Utilizes the GammaCorpus-CoT-Math-170k dataset, focusing on Chain-of-Thought (CoT) reasoning to improve step-by-step problem-solving in mathematics.
  • Broad Knowledge Base: Incorporates the GammaCorpus-v2-5m dataset to ensure a wide range of general knowledge and conversational abilities.

Good for

  • Applications requiring high factual precision.
  • Tasks involving complex mathematical problem-solving and logical reasoning.
  • General-purpose conversational AI where robust knowledge and reasoning are critical.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p