typhoon-ai/typhoon-s-4b-nitibench-ccl-legal-agent-research-preview

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Jan 19, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

Typhoon-S-4B NitiBench-CCL Legal Agent is a 4-billion parameter research artifact from typhoon-ai, specifically designed for domain-specific sovereignty in Thai legal reasoning. It utilizes InK-GRPO-based agentic RFT, a post-training strategy that integrates reinforcement learning over multi-step agent trajectories with injected domain knowledge. This model is optimized for agent-based evaluation within a controlled RAG environment using a Thai legal corpus, demonstrating superior performance on the NitiBench (CCL) benchmark compared to larger general-purpose models in this specific setup.

Loading preview...

Typhoon-S-4B NitiBench-CCL Legal Agent: Research Preview

Typhoon-S-4B NitiBench-CCL Legal Agent is a specialized research artifact from typhoon-ai, demonstrating that domain-specific "sovereignty" can outperform brute-force scale in certain applications. This model is not a general-purpose instruction model and is not intended for production or real-world legal use.

Key Capabilities & Differentiators

  • Agentic RFT: The model is post-trained as a multi-step agent, operating within a controlled RAG environment with search and read tools. Reinforcement learning (GRPO) is applied over entire interaction trajectories, optimizing for final-answer correctness.
  • InK-GRPO (Injected Knowledge GRPO): This unique extension augments GRPO with a stochastic auxiliary next-token prediction objective on in-domain Thai legal text. This allows for efficient domain knowledge injection during reinforcement fine-tuning.
  • Domain-Specific Optimization: Training is centered on NitiBench (CCL) and aligned Thai legal corpora, making it highly specialized for Thai legal reasoning tasks.

Good For

  • Researching Agentic RFT and InK-GRPO: Ideal for studying the behavior and effectiveness of these advanced post-training strategies.
  • NitiBench Agentic Evaluation: Specifically designed for benchmark comparison within the official agentic setup (see evaluation pipeline).

Important Limitations

  • Research-only: Not a deployable product model.
  • Not legal advice: Unsafe and unreliable for real-world legal applications.
  • Environment-dependent: Performance is meaningful only within the specified agent + RAG environment and evaluation protocol.
  • Benchmark-specific: Optimized for NitiBench (CCL) and not expected to be useful outside this intended setup.