Name: harsha070/exp2-qwen-island-s42-lambda-0p35 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: harsha070

Model Overview

The harsha070/exp2-qwen-island-s42-lambda-0p35 is a 3.1 billion parameter language model, fine-tuned from the base Qwen/Qwen2.5-3B-Instruct model. It was developed using the TRL library and incorporates the GRPO (Gradient Regularized Policy Optimization) training method.

Key Capabilities

Enhanced Training Method: Utilizes GRPO, a technique detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), suggesting potential improvements in reasoning or specific task performance.
Instruction-Tuned: Built upon an instruction-tuned base model, making it suitable for following user prompts and generating coherent responses.
Large Context Window: Supports a context length of 32768 tokens, allowing for processing and generating longer texts while maintaining conversational history or detailed instructions.

Training Details

The model's training procedure involved the TRL framework, with specific versions of libraries including TRL 1.3.0, Transformers 5.7.0, Pytorch 2.11.0, Datasets 4.8.5, and Tokenizers 0.22.2. The application of GRPO is a notable aspect of its fine-tuning, aiming to refine its performance beyond the base Qwen model.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)