Poojan28/chatbot-rag-gemma2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2.6BQuant:BF16Ctx Length:8kPublished:May 12, 2026License:gemmaArchitecture:Transformer Warm

Poojan28/chatbot-rag-gemma2 is a 2.6 billion parameter instruction-tuned causal language model, converted to MLX format from Google's Gemma-2-2B-IT. This model is optimized for efficient deployment and inference on Apple silicon using the MLX framework. It is designed for chatbot and Retrieval Augmented Generation (RAG) applications, leveraging its instruction-following capabilities and 8192 token context length.

Loading preview...

Model Overview

Poojan28/chatbot-rag-gemma2 is a 2.6 billion parameter instruction-tuned language model, derived from Google's gemma-2-2b-it and converted into the MLX format. This conversion was performed using mlx-lm version 0.29.1, specifically targeting efficient execution on Apple silicon.

Key Capabilities

  • Instruction Following: Inherits the instruction-tuned capabilities of the base Gemma-2-2B-IT model, making it suitable for conversational AI and task-oriented prompts.
  • MLX Optimization: Designed for high-performance inference on Apple devices, leveraging the MLX framework.
  • Context Length: Supports an 8192 token context window, allowing for processing longer inputs and maintaining conversational history.

Use Cases

This model is particularly well-suited for:

  • Chatbot Development: Building interactive conversational agents that can understand and respond to user queries.
  • Retrieval Augmented Generation (RAG): Integrating with external knowledge bases to generate more informed and accurate responses.
  • On-device AI: Deploying language model capabilities directly on Apple hardware for local processing and reduced latency.