FILM6912/typhoon-s-4b-nitibench-ccl-legal-agent-research-preview

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 7, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The FILM6912/typhoon-s-4b-nitibench-ccl-legal-agent-research-preview is a 4 billion parameter research artifact developed by FILM6912, designed to demonstrate domain-specific sovereignty in legal reasoning. This model utilizes InK-GRPO–based agentic Reinforcement Fine-Tuning (RFT) within a controlled RAG environment, specializing in Thai legal question-answering. It is optimized for agentic evaluation on the NitiBench (CCL) benchmark, achieving 78.02% accuracy in this specific setup.

Loading preview...

Typhoon-S-4B NitiBench-CCL Legal Agent (Research Preview)

This 4 billion parameter model is a research artifact from FILM6912, specifically designed to explore domain-specific sovereignty in AI. It is not a general-purpose instruction model or intended for production legal use.

Key Capabilities & Innovations

  • Agentic Reinforcement Fine-Tuning (RFT): The model is trained as a multi-step agent, operating within a controlled RAG environment with search and read tools.
  • InK-GRPO (Injected Knowledge GRPO): Augments GRPO with a stochastic auxiliary next-token prediction objective on in-domain Thai legal text, injecting domain knowledge during RFT.
  • Specialized Legal Reasoning: Post-trained on NitiBench (CCL) and aligned Thai legal corpora, focusing on question-answer tasks.
  • Benchmark Performance: Achieves 78.02% accuracy on the NitiBench (Thai Legal Reasoning, Agentic) benchmark in its specified agentic setup, outperforming larger models like GPT-5 in this specific context.

Intended Use & Limitations

This model is research-only and primarily for studying Agentic RFT and InK-GRPO behavior. It is only meaningful when evaluated using its official agentic setup (agent + RAG environment) and is not suitable for real-world legal advice or general-purpose tasks. Performance is highly environment-dependent and benchmark-specific, with no guarantees for safety, bias, or robustness.