InosLihka/rhythm-env-meta-trained-iter2

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Apr 25, 2026Architecture:Transformer Cold

InosLihka/rhythm-env-meta-trained-iter2 is a 3.1 billion parameter language model, fine-tuned from unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit with a 32768 token context length. It was trained using the GRPO method, as introduced in the DeepSeekMath paper, which specializes in enhancing mathematical reasoning capabilities. This model is primarily optimized for tasks requiring advanced mathematical and logical problem-solving.

Loading preview...

Overview

InosLihka/rhythm-env-meta-trained-iter2 is a 3.1 billion parameter language model, building upon the unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit base. This model distinguishes itself through its specialized training using the GRPO method, a technique highlighted in the DeepSeekMath paper. The GRPO method is designed to significantly improve mathematical reasoning in language models.

Key Capabilities

  • Enhanced Mathematical Reasoning: Optimized for complex mathematical and logical problem-solving tasks due to its GRPO-based training.
  • Instruction Following: Fine-tuned to respond effectively to user instructions, leveraging its base as an instruction-tuned model.
  • Large Context Window: Supports a context length of 32768 tokens, allowing for processing and generating longer, more detailed responses.

Good for

  • Applications requiring strong mathematical problem-solving.
  • Tasks involving logical deduction and reasoning.
  • Generating detailed and coherent text based on extensive input context.