hamishivi/OpenThinker3-1.5B-RLVE
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Warm

OpenThinker3-1.5B-RLVE is a 1.5 billion parameter language model developed by hamishivi, fine-tuned from OpenThinker3 1.5B using Reinforcement Learning with Verifiable Environments (RLVE). This model demonstrates enhanced performance across various reasoning and problem-solving benchmarks, including AIME, OMEGA-500, OlympiadBench, and LiveCodeBench. It is specifically optimized for complex reasoning tasks and competitive programming challenges, showing significant improvements over its base model.

Loading preview...

OpenThinker3-1.5B-RLVE Overview

OpenThinker3-1.5B-RLVE is a 1.5 billion parameter language model developed by hamishivi, building upon the OpenThinker3 1.5B base model. Its key differentiator is the application of Reinforcement Learning with Verifiable Environments (RLVE), a method detailed in the associated RLVE paper. This training approach aims to improve the model's ability to handle complex reasoning and problem-solving tasks.

Key Capabilities & Performance

The model shows notable performance gains over its predecessor across several challenging benchmarks:

  • AIME 2024 & 2025: Achieves 58.18% and 49.90% respectively, outperforming the base model's 54.32% and 42.03%.
  • OMEGA-500: Scores 29.45% compared to 25.15%.
  • OlympiadBench: Reaches 62.67% against 56.85%.
  • BBEH: Improves to 7.13% from 4.00%.
  • LiveCodeBench-v6: Demonstrates a Pass@8 of 34.07%, up from 28.17%.

These results indicate enhanced capabilities in mathematical reasoning, competitive programming, and general problem-solving.

Intended Use Cases

This model is particularly well-suited for applications requiring:

  • Advanced Reasoning: Tasks that demand logical deduction and multi-step problem-solving.
  • Competitive Programming: Generating or assisting with code solutions for complex algorithmic challenges.
  • Mathematical Problem Solving: Tackling problems similar to those found in math olympiads or advanced tests.

Further training details and evaluation instructions are available in the RLVE GitHub Repository.