dp66/UMA-4B
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Jan 14, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm
dp66/UMA-4B is a 4 billion parameter causal language model, fine-tuned using agentic Reinforcement Learning (RL). Built upon the Qwen3-4B-Instruct-2507 base model, it features a 32768 token context length. This model is optimized for agentic tasks, leveraging its RL fine-tuning to enhance performance in complex, multi-step interactions.
Loading preview...
UMA-4B: Agentic RL Fine-Tuned Model
UMA-4B is a 4 billion parameter causal language model developed by dp66, distinguished by its agentic Reinforcement Learning (RL) fine-tuning. This model is built on the robust Qwen3-4B-Instruct-2507 base architecture and supports a substantial context length of 32768 tokens.
Key Capabilities
- Agentic Task Optimization: The primary differentiator of UMA-4B is its fine-tuning with agentic RL, making it particularly adept at tasks requiring sequential decision-making and interaction.
- Causal Language Modeling: As a causal language model, it is designed for text generation, completion, and understanding based on preceding tokens.
- Extended Context Window: With a 32768 token context length, UMA-4B can process and generate longer, more coherent responses, retaining information over extended interactions.
Good For
- Agent-based Applications: Ideal for developing AI agents that need to perform multi-turn conversations, execute complex instructions, or interact with environments.
- Advanced Instruction Following: Its RL fine-tuning suggests enhanced capabilities in understanding and executing nuanced instructions.
- Long-form Content Generation: The large context window makes it suitable for tasks requiring sustained coherence, such as writing articles, summaries, or detailed reports.