eekay/gemma-2b-it-noised-np0.1-attn-emb-s40

TEXT GENERATIONConcurrency Cost:1Model Size:2.5BQuant:BF16Ctx Length:8kPublished:Jun 18, 2026Architecture:Transformer Cold

The eekay/gemma-2b-it-noised-np0.1-attn-emb-s40 is a 2.5 billion parameter instruction-tuned model based on the Gemma architecture, featuring a context length of 8192 tokens. This model incorporates noise (np0.1) and attention embedding scaling (s40), suggesting experimental modifications to the base Gemma-2B-IT for potentially enhanced robustness or specific performance characteristics. Its primary application would likely involve instruction-following tasks where a compact yet capable model is beneficial.

Loading preview...

Model Overview

The eekay/gemma-2b-it-noised-np0.1-attn-emb-s40 is an instruction-tuned language model built upon the Gemma architecture, featuring approximately 2.5 billion parameters and supporting an 8192-token context window. This particular iteration appears to be an experimental variant, incorporating specific modifications such as noise (np0.1) and attention embedding scaling (s40).

Key Characteristics

  • Base Architecture: Gemma-2B-IT, indicating a foundation in Google's Gemma family of lightweight open models.
  • Parameter Count: 2.5 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Supports an 8192-token context, suitable for handling moderately long inputs and generating coherent responses.
  • Experimental Modifications: The noised-np0.1-attn-emb-s40 suffix suggests the integration of noise with a probability of 0.1 and attention embedding scaling by a factor of 40. These modifications are likely aimed at exploring improvements in model robustness, generalization, or specific task performance.

Potential Use Cases

Given its instruction-tuned nature and compact size, this model is potentially suitable for:

  • Instruction Following: Executing a wide range of natural language instructions.
  • Resource-Constrained Environments: Deployment where computational resources are limited, such as edge devices or applications requiring fast inference.
  • Experimental Research: Investigating the impact of noise injection and attention embedding scaling on model behavior and performance.