akshayballal/Qwen2.5-3B-Instruct-Pubmed-16bit-GRPO
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Jan 18, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

akshayballal/Qwen2.5-3B-Instruct-Pubmed-16bit-GRPO is a 3.1 billion parameter instruction-tuned causal language model developed by akshayballal. Finetuned from unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit, this model was trained using Unsloth and Huggingface's TRL library for enhanced efficiency. With a 32768 token context length, it is optimized for tasks requiring instruction following, particularly within the biomedical domain due to its 'Pubmed' designation.

Loading preview...