google/datagemma-rig-27b-it

Cold
Public
27B
FP8
32768
Aug 27, 2024
License: gemma
Hugging Face
Gated
Overview

DataGemma RIG 27B-IT: Integrating Statistical Data into LLM Responses

DataGemma RIG 27B-IT, developed by Google, is a fine-tuned Gemma 2 model specifically engineered to enhance Large Language Models (LLMs) by incorporating public statistical data from Data Commons. This model utilizes a retrieval interleaved generation (RIG) approach, where it's trained to annotate generated text with natural language queries to Data Commons' interface whenever statistics are mentioned. This allows LLMs to access and present verified statistical information directly within their responses.

Key Capabilities

  • Statistical Data Integration: Seamlessly embeds public statistical data from Data Commons into LLM outputs.
  • Retrieval Interleaved Generation (RIG): Annotates generated statistics with [__DC__("<natural language query>") --> "<LLM generated statistic>"] for transparency and verification.
  • Gemma 2 Base: Built upon the Gemma 2 architecture, leveraging its foundational capabilities.
  • Academic and Research Focus: Currently intended for academic and research purposes, with ongoing development.

Usage and Limitations

This model is an early version, fine-tuned on synthetically generated data, and is primarily for academic and research use. It is not yet ready for commercial or general public use and may exhibit unintended behaviors. Users are encouraged to consult the DataGemma paper for detailed information on its implementation, evaluation, and known limitations. The model can be run in 4-bit quantization using bitsandbytes for reduced memory footprint.