Name: ftajwar/qwen3_1.7B_Base_MaxRL_Polaris_1000_steps API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ftajwar

Model Overview

This model is a 2 billion parameter variant of the Qwen3-1.7B-Base architecture, fine-tuned by Fahim Tajwar and his team using a novel approach called Maximum Likelihood Reinforcement Learning (MaxRL). MaxRL is a framework designed to optimize maximum likelihood within reinforcement learning settings.

Key Capabilities & Training

Fine-tuned Base Model: Derived from the robust Qwen3/Qwen3-1.7B-Base model.
MaxRL Objective: Optimized using the MaxRL objective, as detailed in their research paper, "Maximum Likelihood Reinforcement Learning".
Training Data: Trained on the POLARIS-53K dataset.
Computational Resources: Fine-tuned using 32 NVIDIA H200 GPUs over 1000 steps, representing a research checkpoint.

Good For

Researchers exploring advanced reinforcement learning techniques for language models.
Experiments with models fine-tuned specifically for MaxRL objectives.
Applications requiring a Qwen3-based model with specialized RL-driven optimization.

Overview

Model Overview

Key Capabilities & Training

Good For

Full Model Card (README)