cjiao/golden-goose-qwen2.5-1.5b-instruct-greedy-top

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 17, 2026Architecture:Transformer Cold

The cjiao/golden-goose-qwen2.5-1.5b-instruct-greedy-top is a 1.5 billion parameter instruction-tuned causal language model, fine-tuned by cjiao from the Qwen/Qwen2.5-1.5B-Instruct base model. It utilizes the GRPO training method, originally introduced for mathematical reasoning, and supports a context length of 32768 tokens. This model is specifically adapted for enhanced performance in tasks where the GRPO method's benefits are applicable, such as complex reasoning.

Loading preview...

Model Overview

The cjiao/golden-goose-qwen2.5-1.5b-instruct-greedy-top is a 1.5 billion parameter instruction-tuned language model, developed by cjiao. It is a fine-tuned variant of the Qwen/Qwen2.5-1.5B-Instruct base model, leveraging the TRL framework for its training process.

Key Differentiator: GRPO Training

A significant aspect of this model is its training methodology. It was fine-tuned using GRPO (Greedy Policy Optimization), a method highlighted in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests an optimization for tasks that benefit from advanced reasoning capabilities, potentially including mathematical or logical problem-solving.

Technical Specifications

  • Base Model: Qwen2.5-1.5B-Instruct
  • Parameter Count: 1.5 billion
  • Context Length: 32768 tokens
  • Training Framework: TRL (Transformers Reinforcement Learning)

Potential Use Cases

Given its GRPO-based fine-tuning, this model could be particularly effective for:

  • Reasoning-intensive tasks: Applications requiring logical deduction or problem-solving.
  • Instruction following: Benefiting from its instruction-tuned base.
  • Long-context applications: Utilizing its substantial 32K token context window.