Ashapu/anarva-8b-merged

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 16, 2026Architecture:Transformer0.0K Cold

Ashapu/anarva-8b-merged is an 8 billion parameter language model created by Ashapu, leveraging the Llama 3.1 architecture. This model is a DARE TIES merge of four distinct Llama 3.1-8B base models, including NousResearch/Hermes-3, DeepSeek-R1-Distill-Llama, cognitivecomputations/dolphin-2.9.4, and arcee-ai/Llama-3.1-SuperNova-Lite. It is designed to combine the strengths of its constituent models, offering a versatile foundation for various generative AI tasks with a context length of 32768 tokens.

Loading preview...

Model Overview

Ashapu/anarva-8b-merged is an 8 billion parameter language model built upon the Llama 3.1 architecture. It was created by Ashapu using the mergekit tool, specifically employing the DARE TIES merge method. This approach combines the weights of multiple pre-trained models to synthesize their capabilities into a single, more robust model.

Merge Details

The model integrates four distinct Llama 3.1-8B base models, with NousResearch/Hermes-3-Llama-3.1-8B serving as the primary base. The other merged components include:

  • cognitivecomputations/dolphin-2.9.4-llama3.1-8b
  • arcee-ai/Llama-3.1-SuperNova-Lite
  • deepseek-ai/DeepSeek-R1-Distill-Llama-8B

The DARE TIES configuration involved specific density and weight parameters for each contributing model, aiming to optimize the combined performance. The tokenizer configuration also incorporates special tokens from dolphin-2.9.4-llama3.1-8b and DeepSeek-R1-Distill-Llama-8B to enhance its understanding and generation capabilities.

Key Characteristics

  • Architecture: Llama 3.1-based, 8 billion parameters.
  • Merge Method: DARE TIES, combining four high-quality Llama 3.1-8B models.
  • Context Length: Supports a context window of 32768 tokens.
  • Tokenizer: Features a union tokenizer with specific tokens from merged models, including <|im_start|>, <|im_end|>, and <think>.

Potential Use Cases

This merged model is suitable for a wide range of generative AI applications, benefiting from the diverse strengths of its constituent models. Its Llama 3.1 foundation and the DARE TIES merging technique suggest potential for improved reasoning, instruction following, and general language understanding tasks.