SohamK18/data-cleaning-grpo
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 6, 2026Architecture:Transformer Cold
The SohamK18/data-cleaning-grpo model is a 1.5 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is particularly suited for tasks requiring improved logical and mathematical problem-solving, building upon its base Qwen2.5 architecture.
Loading preview...