dp66/UMA-4B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Jan 14, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

dp66/UMA-4B is a 4 billion parameter causal language model, fine-tuned using agentic Reinforcement Learning (RL). Built upon the Qwen3-4B-Instruct-2507 base model, it features a 32768 token context length. This model is optimized for agentic tasks, leveraging its RL fine-tuning to enhance performance in complex, multi-step interactions.

Loading preview...

UMA-4B: Agentic RL Fine-Tuned Model

UMA-4B is a 4 billion parameter causal language model developed by dp66, distinguished by its agentic Reinforcement Learning (RL) fine-tuning. This model is built on the robust Qwen3-4B-Instruct-2507 base architecture and supports a substantial context length of 32768 tokens.

Key Capabilities

  • Agentic Task Optimization: The primary differentiator of UMA-4B is its fine-tuning with agentic RL, making it particularly adept at tasks requiring sequential decision-making and interaction.
  • Causal Language Modeling: As a causal language model, it is designed for text generation, completion, and understanding based on preceding tokens.
  • Extended Context Window: With a 32768 token context length, UMA-4B can process and generate longer, more coherent responses, retaining information over extended interactions.

Good For

  • Agent-based Applications: Ideal for developing AI agents that need to perform multi-turn conversations, execute complex instructions, or interact with environments.
  • Advanced Instruction Following: Its RL fine-tuning suggests enhanced capabilities in understanding and executing nuanced instructions.
  • Long-form Content Generation: The large context window makes it suitable for tasks requiring sustained coherence, such as writing articles, summaries, or detailed reports.