cjiao/goldengoose-method-v2-bm25-100

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 28, 2026Architecture:Transformer Cold

The cjiao/goldengoose-method-v2-bm25-100 model is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities in language models. This model is optimized for tasks requiring robust mathematical problem-solving and logical deduction, leveraging its specialized training approach. It offers a context length of 32768 tokens, making it suitable for complex reasoning tasks.

Loading preview...

Model Overview

The cjiao/goldengoose-method-v2-bm25-100 is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. Its development utilized the TRL framework and incorporated the GRPO (Gradient-based Reward Policy Optimization) training method. GRPO is a technique specifically introduced to improve mathematical reasoning in large language models, as detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models".

Key Capabilities

  • Enhanced Mathematical Reasoning: Specialized training with GRPO aims to improve the model's ability to handle mathematical problems and logical deductions.
  • Instruction Following: As a fine-tuned instruction model, it is designed to respond effectively to user prompts and instructions.
  • Context Handling: Supports a substantial context length of 32768 tokens, allowing for processing and reasoning over longer inputs.

Training Details

The model's training procedure leveraged the TRL library (version 0.19.1) and was conducted using Transformers (4.57.6), Pytorch (2.5.1), Datasets (4.8.4), and Tokenizers (0.22.2). The GRPO method, central to its training, is documented in the DeepSeekMath paper.

Use Cases

This model is particularly well-suited for applications requiring strong mathematical problem-solving, logical reasoning, and accurate instruction following within a substantial context window. Its specialized training makes it a candidate for tasks where numerical and logical precision are critical.