arcee-ai/Trinity-Large-Thinking

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:399BQuant:FP8Ctx Length:32kPublished:Apr 1, 2026License:apache-2.0Architecture:Transformer0.2K Open Weights Warm

Trinity-Large-Thinking is a 398B-parameter sparse Mixture-of-Experts (MoE) model from Arcee AI, with approximately 13B active parameters per token. This variant is reasoning-optimized and post-trained with extended chain-of-thought and agentic RL. It excels in agentic benchmarks and is purpose-built for tool calling, multi-step planning, and agent workflows, generating explicit reasoning traces in ... blocks.

Loading preview...

Trinity-Large-Thinking: Agentic Reasoning MoE

Trinity-Large-Thinking is a 398B-parameter sparse Mixture-of-Experts (MoE) model developed by Arcee AI, featuring approximately 13B active parameters per token. It is a reasoning-optimized variant of the Trinity-Large family, post-trained with extended chain-of-thought reasoning and agentic Reinforcement Learning (RL).

Key Capabilities & Differentiators

  • Agentic-first design: Specifically engineered for tool calling, complex multi-step planning, and integration into agent workflows.
  • Native Reasoning Traces: Generates explicit chain-of-thought reasoning within <think>...</think> blocks, which are crucial for its performance and must be preserved in context for multi-turn interactions.
  • High Agentic Performance: Achieves strong results on agentic benchmarks, including 94.7% on τ²-Bench, 91.9% on PinchBench, and 98.2% on LiveCodeBench.
  • Extended Context Window: Features a 512k context length, accommodating long reasoning chains across many agentic steps.
  • Framework Compatibility: Works out-of-the-box with major agent frameworks like OpenClaw and Hermes Agent.

Usage Considerations

For optimal performance, especially in multi-turn conversations and agentic loops, it is critical to preserve the model's reasoning_content (the content within <think>...</think> blocks) in the message history. Omitting this can degrade multi-step performance and lead to malformed tool calls. The model is available via vLLM, Transformers, and OpenRouter, with specific configurations for reasoning and tool call parsing.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p