doupari/llama3.1_8b_sft-solo-attn-v2-k28

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 30, 2026Architecture:Transformer Warm

doupari/llama3.1_8b_sft-solo-attn-v2-k28 is an 8 billion parameter language model based on the Llama 3.1 architecture, fine-tuned with a solo attention mechanism. This model is derived from a DeepSpeed ZeRO checkpoint and utilizes the meta-llama/Llama-3.1-8B backbone. It is designed for causal language modeling tasks, offering a 32768 token context length.

Loading preview...

Overview

doupari/llama3.1_8b_sft-solo-attn-v2-k28 is an 8 billion parameter language model built upon the Llama 3.1 architecture. This model incorporates a solo attention mechanism and was fine-tuned from a DeepSpeed ZeRO checkpoint. It leverages the meta-llama/Llama-3.1-8B as its foundational backbone, indicating its lineage and core capabilities.

Key Characteristics

  • Architecture: Llama 3.1-8B base with solo attention (v2-k28 variant).
  • Parameter Count: 8 billion parameters.
  • Context Length: Supports a context window of 32768 tokens.
  • Origin: Derived from a DeepSpeed ZeRO checkpoint, suggesting efficient training methodologies.
  • Tokenizer: Utilizes a tokenizer compatible with Llama 3.1 models, copied from a llama3.1_8b_sft-solo-attn-v2-k24 variant.

Usage

This model is suitable for various causal language modeling applications. Developers can load it using the AutoModelForCausalLM and AutoTokenizer classes from the Hugging Face transformers library. The provided code snippets facilitate easy integration and deployment for inference tasks.