DataPilot/ArrowCanaria-Llama-8B-RL-v0.1
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 21, 2026License:llama3.1Architecture:Transformer0.0K Cold

DataPilot/ArrowCanaria-Llama-8B-RL-v0.1 is an 8 billion parameter Llama 3.1-based model developed by DataPilot, specifically optimized for Japanese AItuber and chatbot applications. It enhances the SFT model (ArrowCanaria-Llama-8B-SFT-v0.1) through a two-phase Reinforcement Learning from Human Feedback (RLHF) process using GRPO and DAPO loss, focusing on improving empathy, listening quality, and the accuracy and clarity of knowledge responses. This model excels at natural Japanese conversation, empathetic consultation, role-play, and general assistant tasks, while maintaining a context length of 4096 tokens.

Loading preview...