Name: lsteno/Qwen3-4B-Instruct-2507-RLM-RLVR-FullFT-lr5e-6-depth1-v1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: lsteno

Model Overview

The lsteno/Qwen3-4B-Instruct-2507-RLM-RLVR-FullFT-lr5e-6-depth1-v1 is a 4 billion parameter instruction-tuned language model. It is a specialized checkpoint that has undergone full-parameter fine-tuning using Reinforcement Learning from Human Feedback (RLHF) techniques, specifically RLM (Reinforcement Learning from Model) and RLVR (Reinforcement Learning from Very-Rare) methods.

Key Characteristics

Base Model: Built upon the Qwen/Qwen3-4B-Instruct-2507 architecture.
Fine-tuning: Utilizes a full-parameter RLM RLVR fine-tuning approach, suggesting enhanced instruction following and response quality.
Training Details: The model is a checkpoint from step 150 of a training run, indicating a specific stage of its optimization process.
Prompt Variant: Optimized for the sanjaya_text_depth1_llm_only_v1 prompt variant, which implies a focus on single-turn, LLM-only interactions.
Runtime Environment: Designed for a depth-1 LLM-only RLM harness with plain Gemini subcalls and disabled recursive child RLMs, pointing to a streamlined and controlled inference environment.

Good For

Applications requiring a 4B parameter model with strong instruction-following capabilities due to RLHF fine-tuning.
Use cases that align with the sanjaya_text_depth1_llm_only_v1 prompt structure.
Environments where a depth-1 LLM-only RLM harness is preferred for controlled and efficient inference.

Overview

Model Overview

Key Characteristics

Good For

Full Model Card (README)