LLucass/TT_L0.2_H0.2_grpo
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Jun 8, 2025Architecture:Transformer Warm

LLucass/TT_L0.2_H0.2_grpo is a 1.5 billion parameter language model, fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. It was trained using the GRPO method on the knoveleng/open-rs dataset, specializing in mathematical reasoning tasks. This model offers a 32768 token context length, making it suitable for applications requiring robust mathematical problem-solving capabilities.

Loading preview...