arcee-ai/Trinity-Large-Thinking

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:399BQuant:FP8Ctx Length:32kPublished:Apr 1, 2026License:otherArchitecture:Transformer0.2K Warm

Trinity-Large-Thinking is a 398 billion parameter sparse Mixture-of-Experts (MoE) model by Arcee AI, with approximately 13 billion active parameters per token. This reasoning-optimized variant is post-trained with extended chain-of-thought and agentic RL, generating explicit reasoning traces in ... blocks. It delivers state-of-the-art performance on agentic benchmarks and is purpose-built for tool calling, multi-step planning, and agent workflows.

Loading preview...

Trinity-Large-Thinking: An Agentic MoE Model

Trinity-Large-Thinking is Arcee AI's 398 billion parameter sparse Mixture-of-Experts (MoE) model, featuring approximately 13 billion active parameters per token. It is a reasoning-optimized variant of the Trinity-Large family, post-trained with extended chain-of-thought reasoning and agentic Reinforcement Learning (RL).

Key Capabilities & Features

  • Agentic-first design: Specifically engineered for tool calling, multi-step planning, and complex agent workflows.
  • Native Reasoning Traces: Generates explicit chain-of-thought within <think>...</think> blocks, which are crucial for maintaining context in multi-turn conversations and agentic loops.
  • High Agentic Performance: Achieves 94.7% on τ²-Bench, 91.9% on PinchBench, and 98.2% on LiveCodeBench, demonstrating strong capabilities in agentic tasks.
  • Extensive Context Window: Features a 512k extended context window to accommodate long reasoning chains across many agentic steps.
  • Compatibility: Works out-of-the-box with major agent frameworks like OpenClaw and Hermes Agent.

Usage Considerations

For optimal performance, especially in multi-turn conversations and agentic loops, it is critical to preserve the model's reasoning_content (the content within <think>...</think> blocks) in the message history. Omitting this can degrade multi-step performance. The model is available via vLLM, Transformers, and OpenRouter.

Architecture

Built on a sparse MoE architecture with 256 experts (4 active), it was pretrained on 17 trillion tokens and post-trained with instruction tuning and agentic RL.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p