EphAsad/AristaeusAgent
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Apr 27, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold
EphAsad/AristaeusAgent is a QLoRA fine-tune of EphAsad/Aristaeus, which is based on Qwen2.5-1.5B-Instruct. This model is specifically trained for structured agentic tool-use, featuring a unique -before-act behavior and Hermes-style tool-call format. It excels at multi-step planning and reasoning quality, making it suitable for tasks requiring deliberate tool selection and execution, despite some limitations in tool refusal.
Loading preview...
AristaeusAgent: Two-Stage Agentic Tool-Use Model
AristaeusAgent is a QLoRA fine-tune of EphAsad/Aristaeus, built upon Qwen2.5-1.5B-Instruct. It represents Stage 2 of a two-stage training pipeline, focusing on adding structured agentic tool-use capabilities to the reasoning foundation established in Stage 1 (Aristaeus).
Key Capabilities
- Agentic Tool-Calling: Implements a
<think>...</think>before<tool_call>...</tool_call>behavior, enabling deliberate reasoning prior to tool invocation. - Hermes-style Format: Utilizes a specific JSON-based tool-call format wrapped in
<tool_call>tags, distinct from the base model's raw JSON output. - Enhanced Reasoning & Planning: Achieved a +17.4 percentage point improvement in overall benchmark score compared to its base, with significant gains in Reasoning Quality (+20 points) and Multi-Step Planning (+14 points).
- QLoRA Fine-tuning: Employs QLoRA (r=16, alpha=32) to preserve Stage 1's chain-of-thought reasoning while extending it with agentic capabilities.
Good for
- Proof-of-Concept for Agentic Pipelines: Demonstrates the viability of a two-stage reasoning-to-agentic pipeline at 1.5B parameters using open datasets.
- Tasks Requiring Deliberate Tool Use: Ideal for scenarios where explicit reasoning and structured tool invocation are critical.
- Research into Agentic LLMs: Provides a foundation for exploring agentic behaviors and tool-calling mechanisms in smaller models.
Limitations
- Tool Over-triggering: The primary limitation is a tendency to call tools unnecessarily for static knowledge questions, scoring lower on Tool Refusal than the base model. This can be partially mitigated with explicit system prompt instructions.
- Hallucination at 1.5B: Due to its parameter size, the model may confabulate supporting details. For production use, a larger base model (e.g., Qwen2.5-3B or 7B) is recommended.
- Recursive Reasoning Failure: Inherits a limitation from Stage 1 where deep recursive call stacks can cause the model to lose context.