swadeshb/Llama-3.2-3B-Instruct-MPO-SKD-V7
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Mar 23, 2026Architecture:Transformer Warm

The swadeshb/Llama-3.2-3B-Instruct-MPO-SKD-V7 is a fine-tuned version of Meta's Llama-3.2-3B-Instruct model. This 3 billion parameter instruction-tuned model was trained using the GRPO method, as introduced in the DeepSeekMath paper, to enhance its mathematical reasoning capabilities. It is optimized for tasks requiring robust logical and mathematical problem-solving, making it suitable for applications where precise numerical and analytical understanding is crucial.

Loading preview...