Kazuki1450/Llama-3.2-3B-Instruct_geo_3_6_clean_1p0_0p0_1p0_grpo_42_rule
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Mar 16, 2026Architecture:Transformer Loading

Kazuki1450/Llama-3.2-3B-Instruct_geo_3_6_clean_1p0_0p0_1p0_grpo_42_rule is a 3.2 billion parameter instruction-tuned language model, fine-tuned from meta-llama/Llama-3.2-3B-Instruct. This model was trained using the GRPO method, as introduced in the DeepSeekMath paper, which focuses on mathematical reasoning. It is optimized for tasks requiring enhanced reasoning capabilities, particularly in mathematical contexts, and has a context length of 32768 tokens.

Loading preview...