nttruong1007/qb-hermes3-llama8b

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 19, 2026License:llama3Architecture:Transformer Cold

Hermes 3 is the latest 8 billion parameter generalist language model from Nous Research, built upon the Llama-3.1 architecture with a 32768 token context length. It features significant improvements in agentic capabilities, roleplaying, reasoning, multi-turn conversation, and long context coherence. This model is specifically designed for user alignment, offering powerful steering capabilities, reliable function calling, structured output, and enhanced code generation skills.

Loading preview...

Hermes 3 - Llama-3.1 8B: An Advanced Generalist LLM

Hermes 3 is the newest iteration in Nous Research's flagship Hermes series, a generalist language model based on the Llama-3.1 architecture with 8 billion parameters and a 32768 token context window. This version introduces substantial enhancements over Hermes 2, focusing on aligning the LLM to the user with robust steering capabilities.

Key Capabilities & Improvements

  • Advanced Agentic Capabilities: Designed for more sophisticated autonomous task execution.
  • Enhanced Roleplaying & Reasoning: Significant improvements in conversational depth and logical inference.
  • Multi-turn Conversation & Long Context Coherence: Maintains consistency and understanding over extended dialogues.
  • Reliable Function Calling & Structured Output: Offers more powerful and dependable mechanisms for tool use and generating structured data, including a dedicated JSON mode.
  • Improved Code Generation: Better performance in generating programming code.
  • User Alignment: Emphasizes powerful steering capabilities and user control.

Performance & Benchmarks

Hermes 3 is competitive with, and in some areas superior to, Llama-3.1 Instruct models in general capabilities. Detailed benchmark comparisons are available in the Hermes 3 Technical Report. On the Open LLM Leaderboard, it achieves an average score of 23.49, with notable results in IFEval (61.70) and BBH (30.72).

Prompt Format

The model utilizes the ChatML prompt format, enabling structured multi-turn conversations and system prompts for steerability. This format is compatible with OpenAI API standards, supporting both general chat and specialized function calling/JSON mode interactions. Examples and guidance for implementing function calling and structured outputs are provided, including a dedicated GitHub repository for function calling utilities.