Kazuki1450/Llama-3.2-3B-Instruct_nseq_4_8_clean_1p0_0p0_1p0_grpo_42_rule

TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Mar 16, 2026Architecture:Transformer Cold

Kazuki1450/Llama-3.2-3B-Instruct_nseq_4_8_clean_1p0_0p0_1p0_grpo_42_rule is a 3.2 billion parameter instruction-tuned causal language model, fine-tuned from Meta's Llama-3.2-3B-Instruct. This model utilizes the GRPO training method, originally introduced for mathematical reasoning, and supports a context length of 32768 tokens. It is specifically adapted for instruction-following tasks, leveraging advanced training techniques to enhance its capabilities.

Loading preview...

Model Overview

This model, Kazuki1450/Llama-3.2-3B-Instruct_nseq_4_8_clean_1p0_0p0_1p0_grpo_42_rule, is a fine-tuned variant of Meta's Llama-3.2-3B-Instruct, developed by Kazuki1450. It is a 3.2 billion parameter instruction-tuned causal language model designed for general instruction-following tasks.

Key Differentiator: GRPO Training

What sets this model apart is its training methodology. It was fine-tuned using GRPO (Gradient-based Reward Policy Optimization), a method first introduced in the context of enhancing mathematical reasoning in large language models, as detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests a potential for improved reasoning or structured response generation compared to standard instruction-tuned models.

Technical Specifications

  • Base Model: meta-llama/Llama-3.2-3B-Instruct
  • Parameter Count: 3.2 billion
  • Context Length: 32768 tokens
  • Training Framework: TRL (Transformers Reinforcement Learning)

When to Use This Model

  • Instruction Following: Ideal for applications requiring the model to accurately follow given instructions.
  • Resource-Constrained Environments: Its 3.2B parameter size makes it suitable for deployment where computational resources are limited, offering a balance between performance and efficiency.
  • Exploration of GRPO Benefits: Developers interested in models trained with advanced optimization techniques like GRPO for potentially enhanced reasoning capabilities might find this model particularly useful.