Overview
Nexura-Gemma-2B: A Fine-Tuned & DPO-Aligned Gemma-2B Model
Nexura-Gemma-2B is a specialized variant of Google's Gemma-2B, developed by arunvpp05. This 2 billion parameter, decoder-only transformer LLM is distinguished by its two-stage training approach: initial Supervised Fine-Tuning (SFT) on diverse, high-quality instruction datasets (including Alpaca, Dolly-15k, and filtered samples from Lamini, IGN, and UltraChat) followed by Direct Preference Optimization (DPO) for robust alignment. The DPO stage leverages preference datasets like Anthropic HH-RLHF, Stanford SHP, UltraFeedback, and JudgeLM.
Key Capabilities & Features
- Optimized Instruction Following: Trained to adhere strictly to an XML-style instruction format (
<user>{instruction}</user><assistant>{response}) for consistent and stable output. - Lightweight & Efficient: At 2 billion parameters, it offers fast inference, suitable for consumer GPUs (8GB+ VRAM recommended, runs on 4-bit quantized mode).
- Strong Alignment: DPO training ensures clean behavior and stable responses, particularly when the specified prompt format is followed.
- General-Purpose Text Generation: Designed for a wide array of tasks, from chat assistance to content rewriting.
Recommended Use Cases
- Chat assistants and instruction-following applications.
- Educational Q&A and reasoning tasks.
- Coding assistance and content summarization/rewriting.
Limitations
- Requires strict adherence to its XML-style prompt format; deviations may lead to hallucinations.
- Not multilingual and has limited knowledge compared to larger LLMs, with no factual updates post-2023 (inherent Gemma limitation).
This model is licensed under the Gemma License, permitting both research and commercial use with attribution to Google.