espressovi/BODHI-qwen-3-math-8b-rlvr

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 29, 2026License:mitArchitecture:Transformer Open Weights Cold

The espressovi/BODHI-qwen-3-math-8b-rlvr is an 8 billion parameter Qwen-3 model, specifically optimized through Reinforcement Learning (RL) from its base model, espressovi/BODHI-qwen-3-8b-distil. With a 32768 token context length, this model is designed for enhanced performance in mathematical reasoning tasks. Its RL training aims to improve accuracy and reliability in complex quantitative problem-solving.

Loading preview...

BODHI-qwen-3-math-8b-rlvr Overview

The espressovi/BODHI-qwen-3-math-8b-rlvr is an 8 billion parameter language model built upon the Qwen-3 architecture. This model distinguishes itself through its specialized training regimen, having undergone Reinforcement Learning (RL) from its predecessor, espressovi/BODHI-qwen-3-8b-distil. This RL fine-tuning process is specifically geared towards enhancing the model's capabilities in mathematical reasoning and problem-solving.

Key Characteristics

  • Base Architecture: Qwen-3, an advanced transformer-based model.
  • Parameter Count: 8 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Supports a substantial context window of 32768 tokens, beneficial for handling complex and multi-step mathematical problems.
  • Specialized Training: Utilizes Reinforcement Learning (RL) to refine its mathematical reasoning abilities, aiming for improved accuracy and robustness in quantitative tasks.

Ideal Use Cases

  • Mathematical Problem Solving: Excels in tasks requiring logical deduction and numerical computation.
  • Quantitative Analysis: Suitable for applications involving data interpretation and mathematical modeling.
  • Educational Tools: Can be integrated into platforms for generating explanations or solving math-related queries.