ArrowCanaria-Llama-8B-RL-v0.1: Enhanced Japanese AItuber Model
ArrowCanaria-Llama-8B-RL-v0.1 is an 8 billion parameter model from DataPilot, built upon the tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5 base and further refined from ArrowCanaria-Llama-8B-SFT-v0.1. This model significantly improves response quality through a two-phase Reinforcement Learning from Human Feedback (RLHF) process, specifically targeting Japanese AItuber and chatbot use cases. It utilizes the GRPO (Group Relative Policy Optimization) algorithm with DAPO loss for stable optimization, enhancing empathetic and knowledge-based interactions.
Key Capabilities
- Natural Japanese Responses: Delivers human-like dialogue, free from boilerplate or translation-like phrasing, further refined by RLHF.
- High Consultation Performance: Optimized for empathy, active listening, and providing concrete advice through external Reward Model feedback.
- Role-Play & Character Dialogue: Retains consistent personality and emotional expression acquired during SFT.
- Reasoning & Knowledge Response: Enhanced for accurate and clear knowledge delivery.
- Tool Use / RAG: Supports Function Calling and Retrieval-Augmented Generation.
- Creative Expression: Exhibits rich Japanese expressive power, including literary metaphors and vivid descriptions.
Good for
- AItuber / AI VTuber: Engaging in casual chat and responding to comments during live streams.
- Chatbots: Facilitating natural daily conversations and offering empathetic advice in Japanese.
- Role-Play: Supporting character-driven dialogue and creative writing.
- General Assistant: Handling a wide range of tasks including knowledge retrieval, reasoning, and tool invocation.