Name: rinna/gemma-2-baku-2b-it API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: rinna

Overview

rinna/gemma-2-baku-2b-it is an instruction-tuned variant of the rinna/gemma-2-baku-2b base model, featuring 2.6 billion parameters and a 8192-token context length. Developed by rinna, this model is built upon a 26-layer transformer architecture, consistent with the Gemma 2 family.

Key Capabilities and Training

Instruction Following: The model's instruction-following capabilities were enhanced through a unique process involving a "chat vector" addition. This vector was calculated by subtracting the parameter vectors of google/gemma-2-2b from google/gemma-2-2b-it and applying it to the rinna/gemma-2-baku-2b base model.
Preference Optimization: Further refinement was achieved using Odds Ratio Preference Optimization (ORPO) on a subset of rinna's internal datasets, optimizing the model's responses based on preferences.
Gemma 2 Chat Format: It is designed to adhere to the standard Gemma 2 chat format, ensuring compatibility and consistent interaction patterns.
Tokenization: The model utilizes the original google/gemma-2-2b-it tokenizer.

Usage Recommendations

When performing batch inference with bfloat16 precision, it is recommended to use eager attention (attn_implementation="eager") to avoid NaN values that can occur with the default attention mechanism for padded input sequences in Gemma 2 models.

Good for

Applications requiring a compact (2.6B parameter) instruction-tuned model for general language tasks.
Developers working within the Gemma 2 ecosystem who need a model optimized with specific preference tuning techniques.
Use cases where efficient inference with bfloat16 is desired, provided attention implementation is managed correctly.