swadeshb/Llama-3.2-3B-Instruct-MPO-SKD-V2
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Feb 7, 2026Architecture:Transformer Cold

The swadeshb/Llama-3.2-3B-Instruct-MPO-SKD-V2 is a 3.2 billion parameter instruction-tuned causal language model, fine-tuned from meta-llama/Llama-3.2-3B-Instruct. This model was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is particularly suited for tasks requiring advanced mathematical problem-solving and logical deduction.

Loading preview...