Gen-Verse/DemyAgent-4B
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Sep 29, 2025Architecture:Transformer0.0K Warm

DemyAgent-4B by Gen-Verse is a 4 billion parameter agentic reasoning model with a 40960 token context length. It is trained using the GRPO-TCR recipe with 30K high-quality agentic RL data, enabling it to achieve strong performance on challenging benchmarks like AIME2024/2025, GPQA-Diamond, and LiveCodeBench-v6. This model demonstrates that smaller models can outperform much larger alternatives (14B/32B) through effective reinforcement learning strategies, excelling in complex problem-solving tasks.

Loading preview...

DemyAgent-4B: Agentic Reasoning with Reinforcement Learning

DemyAgent-4B, developed by Gen-Verse, is a 4 billion parameter model specifically designed for agentic reasoning tasks. It leverages a novel GRPO-TCR training recipe and 30,000 high-quality agentic RL data points to achieve competitive performance against significantly larger models (14B/32B parameters).

Key Capabilities & Differentiators

  • Exceptional Reasoning: Achieves state-of-the-art results on AIME2025 (70.0%) and strong performance on AIME2024 (72.6%) and GPQA-Diamond (58.5%), often outperforming models with 4-8x more parameters.
  • Efficient Agentic Performance: Demonstrates that effective Reinforcement Learning strategies, particularly with high-quality, real end-to-end trajectories, enable smaller models to excel in complex agentic tasks.
  • Optimized Tool Use: Employs deliberative reasoning with selective tool calls, providing superior efficiency compared to long-CoT models.
  • Data-Driven Approach: Highlights the critical role of data quality, training efficiency (exploration-friendly techniques), and reasoning strategy in agentic RL.

Ideal Use Cases

  • Complex Problem Solving: Suited for applications requiring advanced mathematical, scientific, and code-related reasoning.
  • Resource-Constrained Environments: Offers a powerful solution for agentic tasks where computational resources are limited, due to its efficient performance at a smaller scale.
  • Agent Development: Useful for developers building intelligent agents that require robust reasoning and strategic tool invocation.