ftajwar/qwen3_1.7B_Base_MaxRL_Polaris_1000_steps

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Feb 26, 2026License:mitArchitecture:Transformer Open Weights Warm

The ftajwar/qwen3_1.7B_Base_MaxRL_Polaris_1000_steps model is a 2 billion parameter Qwen3-1.7B-Base model fine-tuned by Fahim Tajwar and collaborators using the MaxRL (Maximum Likelihood Reinforcement Learning) objective. This model is specifically optimized for tasks related to reinforcement learning settings, leveraging the POLARIS-53K dataset. It represents a research checkpoint demonstrating the application of MaxRL for improving language model performance in RL contexts.

Loading preview...

Model Overview

This model is a 2 billion parameter variant of the Qwen3-1.7B-Base architecture, fine-tuned by Fahim Tajwar and his team using a novel approach called Maximum Likelihood Reinforcement Learning (MaxRL). MaxRL is a framework designed to optimize maximum likelihood within reinforcement learning settings.

Key Capabilities & Training

Good For

  • Researchers exploring advanced reinforcement learning techniques for language models.
  • Experiments with models fine-tuned specifically for MaxRL objectives.
  • Applications requiring a Qwen3-based model with specialized RL-driven optimization.