stillarrow/qwen2.5-coder-1.5b-instruct__scpo_no_std_code_hidden_only_shortcut_guard

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 7, 2026Architecture:Transformer0.0K Warm

The stillarrow/qwen2.5-coder-1.5b-instruct__scpo_no_std_code_hidden_only_shortcut_guard model is a 1.5 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-Coder-1.5B-Instruct. It was trained using the TRL framework and the GRPO method on the eurus2_rl_coding_hidden_only dataset. This model is specifically optimized for coding tasks, leveraging reinforcement learning techniques for improved performance in code generation and understanding.

Loading preview...

Overview

This model is a specialized 1.5 billion parameter instruction-tuned language model, derived from the Qwen2.5-Coder-1.5B-Instruct architecture. It has been fine-tuned using the TRL (Transformer Reinforcement Learning) framework, specifically employing the GRPO (Generalized Reinforcement Learning with Policy Optimization) method. The training utilized the eurus2_rl_coding_hidden_only dataset, indicating a focus on coding-related tasks.

Key Training Details

  • Base Model: Qwen/Qwen2.5-Coder-1.5B-Instruct
  • Fine-tuning Method: GRPO, as introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).
  • Dataset: eurus2_rl_coding_hidden_only
  • Framework: TRL (Transformer Reinforcement Learning)

Potential Use Cases

Given its fine-tuning on a coding-specific dataset and the application of GRPO, this model is likely optimized for:

  • Code generation: Producing code snippets based on natural language instructions.
  • Code completion: Assisting developers by suggesting code.
  • Code understanding: Answering questions related to code logic or functionality.
  • Educational tools: Providing explanations or solutions for coding problems.