kmseong/llama3.1_8b_base-Safety-FT-lr3e-5

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 14, 2026License:llama3.2Architecture:Transformer Warm

The kmseong/llama3.1_8b_base-Safety-FT-lr3e-5 is an 8 billion parameter language model based on the Llama 3.1 architecture, featuring a context length of 32768 tokens. This model incorporates attention mechanisms (q, k, v) and MLP (up, down) with perlayer application, followed by non-freeze training. It is specifically fine-tuned for safety alignment, utilizing a Weight space Rotation Process (Warp) as indicated by its citation.

Loading preview...

Model Overview

The kmseong/llama3.1_8b_base-Safety-FT-lr3e-5 is an 8 billion parameter language model built upon the Llama 3.1 base architecture, supporting a substantial context length of 32768 tokens. This model has undergone specific fine-tuning to enhance safety alignment.

Key Technical Details

  • Architecture: Llama 3.1 base with 8 billion parameters.
  • Context Length: Supports up to 32768 tokens.
  • Training Methodology: The model's training incorporates attention mechanisms (q, k, v) and MLP (up, down) with perlayer application. A notable aspect is its subsequent non-freeze training phase.
  • Safety Alignment: The model is explicitly fine-tuned for safety, leveraging a technique referred to as "Weight space Rotation Process" (Warp), as detailed in its associated citation.

Good For

  • Applications requiring a Llama 3.1-based model with enhanced safety characteristics.
  • Use cases where a large context window (32768 tokens) is beneficial.
  • Research and development into safety alignment techniques for large language models.