harsha070/exp2-qwen-island-s42-lambda-0p35

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:May 4, 2026Architecture:Transformer Cold

The harsha070/exp2-qwen-island-s42-lambda-0p35 is a 3.1 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-3B-Instruct. It leverages the GRPO training method, as introduced in the DeepSeekMath paper, to enhance its capabilities. With a context length of 32768 tokens, this model is optimized for improved performance through advanced training techniques.

Loading preview...

Model Overview

The harsha070/exp2-qwen-island-s42-lambda-0p35 is a 3.1 billion parameter language model, fine-tuned from the base Qwen/Qwen2.5-3B-Instruct model. It was developed using the TRL library and incorporates the GRPO (Gradient Regularized Policy Optimization) training method.

Key Capabilities

  • Enhanced Training Method: Utilizes GRPO, a technique detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), suggesting potential improvements in reasoning or specific task performance.
  • Instruction-Tuned: Built upon an instruction-tuned base model, making it suitable for following user prompts and generating coherent responses.
  • Large Context Window: Supports a context length of 32768 tokens, allowing for processing and generating longer texts while maintaining conversational history or detailed instructions.

Training Details

The model's training procedure involved the TRL framework, with specific versions of libraries including TRL 1.3.0, Transformers 5.7.0, Pytorch 2.11.0, Datasets 4.8.5, and Tokenizers 0.22.2. The application of GRPO is a notable aspect of its fine-tuning, aiming to refine its performance beyond the base Qwen model.