kmseong/llama2_7b_chat-WaRP-gsm8k-FT-lr3e-5_ssft_5e-5

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 20, 2026License:llama3.2Architecture:Transformer Warm

The kmseong/llama2_7b_chat-WaRP-gsm8k-FT-lr3e-5_ssft_5e-5 model is a 7 billion parameter Llama 2 Chat variant, fine-tuned with a Weight space Rotation Process (WaRP) and further optimized on the GSM8K dataset. This model incorporates per-layer application of attention (q,k,v) and MLP (up, down) modifications, followed by non-freeze training. It is specifically designed for enhanced performance in mathematical reasoning tasks, leveraging its specialized fine-tuning approach.

Loading preview...

Model Overview

The kmseong/llama2_7b_chat-WaRP-gsm8k-FT-lr3e-5_ssft_5e-5 is a 7 billion parameter model based on the Llama 2 Chat architecture. It has undergone a specialized fine-tuning process involving a "Weight space Rotation Process" (WaRP) and subsequent training on the GSM8K dataset, which is known for mathematical reasoning problems. The model's training methodology includes applying modifications to the attention mechanism (query, key, value) and MLP layers (up, down) on a per-layer basis, followed by a non-freeze training phase.

Key Characteristics

  • Architecture: Llama 2 Chat (7B parameters)
  • Fine-tuning: Utilizes a Weight space Rotation Process (WaRP) for safety alignment, as described in the associated citation.
  • Optimization: Further fine-tuned on the GSM8K dataset, indicating a focus on mathematical problem-solving capabilities.
  • Training Method: Incorporates per-layer adjustments to attention and MLP components, followed by a non-freeze training approach.

Ideal Use Cases

This model is particularly well-suited for applications requiring:

  • Mathematical Reasoning: Its fine-tuning on GSM8K suggests strong performance in arithmetic and logical problem-solving.
  • Chat-based Interactions: As a Llama 2 Chat variant, it retains conversational abilities.
  • Research into WaRP: Developers interested in the "Weight space Rotation Process" for safety alignment may find this model a relevant case study.