abhinav0231/Lily-1.5b-v0.3

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 11, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Lily-1.5b-v0.3 is a 1.5 billion parameter instruction-tuned causal language model developed by abhinav0231, based on a Qwen2-style architecture. This model is distilled from a larger teacher model and specifically fine-tuned for high-quality, long-form assistant responses with structured reasoning, often using explicit and blocks. It excels at instruction following and generating stepwise, tutor-like outputs, making it suitable for structured conversational AI experiments.

Loading preview...

Model Overview

Lily-1.5b-v0.3 is a 1.5 billion parameter instruction-tuned language model developed by abhinav0231. It is a distilled version of abhinav0231/Lily-1.5b-v0.1, fine-tuned on the abhinav0231/Sarvam-105b-Distill-100k dataset. The model utilizes a Qwen2-style architecture with 28 layers, a hidden size of 1536, and 12 attention heads.

Key Capabilities

  • Structured Response Generation: Trained extensively on ChatML conversations featuring explicit <think> and <answer> blocks, enabling it to produce detailed, stepwise, and tutor-like outputs.
  • Instruction Following: Optimized for adhering to instructions, particularly in conversational contexts.
  • Distilled Reasoning: Focuses on generating reasoning-flavored outputs, making it suitable for tasks requiring explanations or breakdowns.
  • Compact Size: At 1.5 billion parameters, it offers usability and efficient inference for lightweight applications.

Training Details

The model was trained using QLoRA and Unsloth on a single NVIDIA A100-SXM4-40GB GPU, leveraging BF16 mixed precision and Flash Attention 2. The training dataset consisted of over 91,000 ChatML-formatted examples, with a mean length of 1640 tokens, emphasizing structured conversational patterns.

Intended Use

This model is ideal for:

  • Instruction-following chat experiments.
  • Generating structured answers and explanations.
  • Research into distilled reasoning-style outputs.
  • Lightweight local or hosted inference where structured, tutor-like responses are desired. It performs best with ChatML-style prompting.