EphAsad/AristaeusAgent

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Apr 27, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

EphAsad/AristaeusAgent is a QLoRA fine-tune of EphAsad/Aristaeus, which is based on Qwen2.5-1.5B-Instruct. This model is specifically trained for structured agentic tool-use, featuring a unique -before-act behavior and Hermes-style tool-call format. It excels at multi-step planning and reasoning quality, making it suitable for tasks requiring deliberate tool selection and execution, despite some limitations in tool refusal.

Loading preview...

AristaeusAgent: Two-Stage Agentic Tool-Use Model

AristaeusAgent is a QLoRA fine-tune of EphAsad/Aristaeus, built upon Qwen2.5-1.5B-Instruct. It represents Stage 2 of a two-stage training pipeline, focusing on adding structured agentic tool-use capabilities to the reasoning foundation established in Stage 1 (Aristaeus).

Key Capabilities

  • Agentic Tool-Calling: Implements a <think>...</think> before <tool_call>...</tool_call> behavior, enabling deliberate reasoning prior to tool invocation.
  • Hermes-style Format: Utilizes a specific JSON-based tool-call format wrapped in <tool_call> tags, distinct from the base model's raw JSON output.
  • Enhanced Reasoning & Planning: Achieved a +17.4 percentage point improvement in overall benchmark score compared to its base, with significant gains in Reasoning Quality (+20 points) and Multi-Step Planning (+14 points).
  • QLoRA Fine-tuning: Employs QLoRA (r=16, alpha=32) to preserve Stage 1's chain-of-thought reasoning while extending it with agentic capabilities.

Good for

  • Proof-of-Concept for Agentic Pipelines: Demonstrates the viability of a two-stage reasoning-to-agentic pipeline at 1.5B parameters using open datasets.
  • Tasks Requiring Deliberate Tool Use: Ideal for scenarios where explicit reasoning and structured tool invocation are critical.
  • Research into Agentic LLMs: Provides a foundation for exploring agentic behaviors and tool-calling mechanisms in smaller models.

Limitations

  • Tool Over-triggering: The primary limitation is a tendency to call tools unnecessarily for static knowledge questions, scoring lower on Tool Refusal than the base model. This can be partially mitigated with explicit system prompt instructions.
  • Hallucination at 1.5B: Due to its parameter size, the model may confabulate supporting details. For production use, a larger base model (e.g., Qwen2.5-3B or 7B) is recommended.
  • Recursive Reasoning Failure: Inherits a limitation from Stage 1 where deep recursive call stacks can cause the model to lose context.