Overview
EZO-Qwen2.5-72B-Instruct Overview
AXCXEPT/EZO-Qwen2.5-72B-Instruct is a 72.7 billion parameter model built upon the Qwen/Qwen2.5-72B-Instruct base. It features a substantial 131,072 token context length, enabling it to process extensive inputs and generate comprehensive responses. The model has been fine-tuned to significantly improve its overall performance, with a particular emphasis on Japanese language tasks.
Key Capabilities & Performance
- Exceptional Japanese Language Performance: Achieved a score surpassing GPT-4-Turbo on the Japanese MT Bench when evaluated by GPT-4o, even with 4-bit quantization.
- Multilingual Adaptability: Designed to address diverse global needs, demonstrating strong performance across various languages despite its Japanese focus.
- Instruction Tuning: Utilizes a plain instruction tuning method with high-quality data extracted from Japanese Wikipedia and FineWeb, enhancing its ability to understand and generate high-quality responses.
Training & Data
The model's training involved creating instruction data from high-quality Japanese Wikipedia and FineWeb datasets. This innovative approach, including pre-instruction training, aims for performance improvements across different languages and domains, making it suitable for a wide range of global use cases.
When to Use This Model
- Japanese Language Applications: Ideal for tasks requiring high proficiency in Japanese, such as content generation, translation, and complex query answering.
- Multilingual Instruction Following: Suitable for applications that benefit from a model capable of understanding and generating responses in multiple languages, especially where robust instruction adherence is critical.
- Research and Development: Positioned as an experimental prototype for research and development purposes, offering a powerful base for further exploration and fine-tuning.