adithash/gemma2b-dolly-qlora-merged

TEXT GENERATIONConcurrency Cost:1Model Size:2.5BQuant:BF16Ctx Length:8kPublished:Apr 30, 2026License:gemmaArchitecture:Transformer Cold

The adithash/gemma2b-dolly-qlora-merged model is a 2.5 billion parameter Gemma-2B-IT variant, fine-tuned by adithash using QLoRA on the Databricks Dolly-15k dataset. This fully merged model integrates the adapter weights directly into the base model, enabling single-step loading without PEFT dependencies. It is optimized for instruction-following tasks and serves as a straightforward solution for learning and experimentation with fine-tuned LLMs.

Loading preview...

Model Overview

This model, adithash/gemma2b-dolly-qlora-merged, is a 2.5 billion parameter instruction-tuned variant of Google's Gemma-2B-IT. It was fine-tuned using QLoRA on the databricks/databricks-dolly-15k dataset, which comprises 14,911 instruction-following samples. A key differentiator is that this is a fully merged model, meaning the QLoRA adapter weights have been fused into the base model. This eliminates the need for PEFT (Parameter-Efficient Fine-Tuning) or separate adapter loading, simplifying deployment and usage compared to adapter-only versions.

Key Capabilities & Features

  • Standalone Deployment: No PEFT or base model required at inference time; load and run directly.
  • Instruction Following: Fine-tuned specifically for general instruction-following tasks based on the Dolly-15k dataset.
  • Simplified Usage: Offers a single-step loading process, making it easy to integrate into projects.
  • Memory-Efficient Options: Supports 4-bit quantization for reduced VRAM usage on GPUs.

Intended Use Cases

  • Learning & Experimentation: Ideal for understanding and experimenting with QLoRA fine-tuning and merged LLMs.
  • Portfolio Demonstration: Useful for showcasing end-to-end QLoRA fine-tuning workflows.
  • Starting Point: Can serve as a base for further domain-specific instruction tuning.

Limitations

  • Fine-tuned for only 500 steps, limiting its overall performance compared to fully trained models.
  • As a 2.5B parameter model, its capacity for complex multi-step reasoning is limited.
  • Training sequence length was capped at 256 tokens, which may affect performance on very long prompts.