rinna/gemma-2-baku-2b-it

Warm
Public
2.6B
BF16
8192
License: gemma
Hugging Face
Overview

Overview

rinna/gemma-2-baku-2b-it is an instruction-tuned variant of the rinna/gemma-2-baku-2b base model, featuring 2.6 billion parameters and a 8192-token context length. Developed by rinna, this model is built upon a 26-layer transformer architecture, consistent with the Gemma 2 family.

Key Capabilities and Training

  • Instruction Following: The model's instruction-following capabilities were enhanced through a unique process involving a "chat vector" addition. This vector was calculated by subtracting the parameter vectors of google/gemma-2-2b from google/gemma-2-2b-it and applying it to the rinna/gemma-2-baku-2b base model.
  • Preference Optimization: Further refinement was achieved using Odds Ratio Preference Optimization (ORPO) on a subset of rinna's internal datasets, optimizing the model's responses based on preferences.
  • Gemma 2 Chat Format: It is designed to adhere to the standard Gemma 2 chat format, ensuring compatibility and consistent interaction patterns.
  • Tokenization: The model utilizes the original google/gemma-2-2b-it tokenizer.

Usage Recommendations

  • When performing batch inference with bfloat16 precision, it is recommended to use eager attention (attn_implementation="eager") to avoid NaN values that can occur with the default attention mechanism for padded input sequences in Gemma 2 models.

Good for

  • Applications requiring a compact (2.6B parameter) instruction-tuned model for general language tasks.
  • Developers working within the Gemma 2 ecosystem who need a model optimized with specific preference tuning techniques.
  • Use cases where efficient inference with bfloat16 is desired, provided attention implementation is managed correctly.