Tanuki-8B-dpo-v1.0 Overview
Tanuki-8B-dpo-v1.0 is an 8 billion parameter large language model developed by weblab-GENIAC, a collaborative project under the Matsuo Lab LLM Development Project. It was pretrained from scratch on approximately 1.3 trillion tokens and subsequently fine-tuned for dialogue using Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO).
Key Capabilities & Features
- Japanese Language Optimization: Specifically tuned for conversational Japanese, utilizing a Japanese Alpaca-style prompt format.
- Instruction Following: Designed to follow instructions effectively, with a recommended default system prompt for optimal performance.
- Benchmarked Performance: Achieves a Japanese MT-Bench average score of 7.24, with strong results in humanities (9.1), roleplay (8.75), and writing (9.05).
- Quantized Versions Available: Offers AWQ 4-bit, GPTQ 4-bit, and GPTQ 8-bit quantized models for efficient deployment, though GGUF versions are not recommended due to potential performance degradation.
- Human Evaluation Data: Publicly available human evaluation data (approximately 2000 entries) from a Chatbot Arena-like blind test, providing transparent performance insights.
Use Cases
This model is particularly well-suited for Japanese-centric conversational AI applications, chatbots, and tasks requiring nuanced understanding and generation of Japanese text. Its DPO tuning makes it effective for generating helpful and engaging responses in dialogue systems.