Poojan28/chatbot-rag-gemma2
Poojan28/chatbot-rag-gemma2 is a 2.6 billion parameter instruction-tuned causal language model, converted to MLX format from Google's Gemma-2-2B-IT. This model is optimized for efficient deployment and inference on Apple silicon using the MLX framework. It is designed for chatbot and Retrieval Augmented Generation (RAG) applications, leveraging its instruction-following capabilities and 8192 token context length.
Loading preview...
Model Overview
Poojan28/chatbot-rag-gemma2 is a 2.6 billion parameter instruction-tuned language model, derived from Google's gemma-2-2b-it and converted into the MLX format. This conversion was performed using mlx-lm version 0.29.1, specifically targeting efficient execution on Apple silicon.
Key Capabilities
- Instruction Following: Inherits the instruction-tuned capabilities of the base Gemma-2-2B-IT model, making it suitable for conversational AI and task-oriented prompts.
- MLX Optimization: Designed for high-performance inference on Apple devices, leveraging the MLX framework.
- Context Length: Supports an 8192 token context window, allowing for processing longer inputs and maintaining conversational history.
Use Cases
This model is particularly well-suited for:
- Chatbot Development: Building interactive conversational agents that can understand and respond to user queries.
- Retrieval Augmented Generation (RAG): Integrating with external knowledge bases to generate more informed and accurate responses.
- On-device AI: Deploying language model capabilities directly on Apple hardware for local processing and reduced latency.