wvnvwn/llama-2-13b-chat-hf-lr5e-5-resta-0.5

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Apr 30, 2026Architecture:Transformer Cold

The wvnvwn/llama-2-13b-chat-hf-lr5e-5-resta-0.5 model is a 13 billion parameter language model, a merge of Llama-2-13b-chat-hf and two fine-tuned variants, created using the linear merge method. This model combines the base Llama-2-13b-chat-hf with versions specifically fine-tuned for GSM8K and Supervised Fine-Tuning (SSFT). It is designed to leverage the strengths of its constituent models, potentially offering improved performance in areas related to mathematical reasoning and general chat capabilities.

Loading preview...

Model Overview

The wvnvwn/llama-2-13b-chat-hf-lr5e-5-resta-0.5 is a 13 billion parameter language model derived from the Llama-2-13b-chat-hf family. It was created using the linear merge method via mergekit, combining three distinct models:

  • wvnvwn/llama-2-13b-chat-hf-lr5e-5-gsm8k-lr5e-5: A Llama-2 variant likely fine-tuned for mathematical reasoning tasks, specifically GSM8K.
  • wvnvwn/llama-2-13b-chat-hf-SSFT-lr5e-5: A Llama-2 variant that has undergone Supervised Fine-Tuning (SSFT).
  • meta-llama/Llama-2-13b-chat-hf: The foundational Llama-2-13b-chat-hf model.

Merge Configuration

The merge process involved specific weighting for each component model across all 40 layers, with wvnvwn/llama-2-13b-chat-hf-lr5e-5-gsm8k-lr5e-5 having a weight of 1.0, wvnvwn/llama-2-13b-chat-hf-SSFT-lr5e-5 a weight of 0.5, and meta-llama/Llama-2-13b-chat-hf a negative weight of -0.5. This configuration suggests an attempt to enhance specific characteristics while potentially mitigating others from the base model.

Potential Use Cases

Given its lineage, this merged model could be suitable for applications requiring:

  • General conversational AI: Leveraging the base Llama-2-13b-chat-hf's capabilities.
  • Improved mathematical reasoning: Benefiting from the GSM8K fine-tuned component.
  • Enhanced instruction following: Drawing from the Supervised Fine-Tuning of one of its merged parts.