Name: dp66/UMA-4B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: dp66

UMA-4B: Agentic RL Fine-Tuned Model

UMA-4B is a 4 billion parameter causal language model developed by dp66, distinguished by its agentic Reinforcement Learning (RL) fine-tuning. This model is built on the robust Qwen3-4B-Instruct-2507 base architecture and supports a substantial context length of 32768 tokens.

Key Capabilities

Agentic Task Optimization: The primary differentiator of UMA-4B is its fine-tuning with agentic RL, making it particularly adept at tasks requiring sequential decision-making and interaction.
Causal Language Modeling: As a causal language model, it is designed for text generation, completion, and understanding based on preceding tokens.
Extended Context Window: With a 32768 token context length, UMA-4B can process and generate longer, more coherent responses, retaining information over extended interactions.

Good For

Agent-based Applications: Ideal for developing AI agents that need to perform multi-turn conversations, execute complex instructions, or interact with environments.
Advanced Instruction Following: Its RL fine-tuning suggests enhanced capabilities in understanding and executing nuanced instructions.
Long-form Content Generation: The large context window makes it suitable for tasks requiring sustained coherence, such as writing articles, summaries, or detailed reports.

Overview

UMA-4B: Agentic RL Fine-Tuned Model

Key Capabilities

Good For

Full Model Card (README)