ftajwar/qwen3_1.7B_Base_MaxRL_Polaris_1000_steps
The ftajwar/qwen3_1.7B_Base_MaxRL_Polaris_1000_steps model is a 2 billion parameter Qwen3-1.7B-Base model fine-tuned by Fahim Tajwar and collaborators using the MaxRL (Maximum Likelihood Reinforcement Learning) objective. This model is specifically optimized for tasks related to reinforcement learning settings, leveraging the POLARIS-53K dataset. It represents a research checkpoint demonstrating the application of MaxRL for improving language model performance in RL contexts.
Loading preview...
Model Overview
This model is a 2 billion parameter variant of the Qwen3-1.7B-Base architecture, fine-tuned by Fahim Tajwar and his team using a novel approach called Maximum Likelihood Reinforcement Learning (MaxRL). MaxRL is a framework designed to optimize maximum likelihood within reinforcement learning settings.
Key Capabilities & Training
- Fine-tuned Base Model: Derived from the robust Qwen3/Qwen3-1.7B-Base model.
- MaxRL Objective: Optimized using the MaxRL objective, as detailed in their research paper, "Maximum Likelihood Reinforcement Learning".
- Training Data: Trained on the POLARIS-53K dataset.
- Computational Resources: Fine-tuned using 32 NVIDIA H200 GPUs over 1000 steps, representing a research checkpoint.
Good For
- Researchers exploring advanced reinforcement learning techniques for language models.
- Experiments with models fine-tuned specifically for MaxRL objectives.
- Applications requiring a Qwen3-based model with specialized RL-driven optimization.