AXCXEPT/EZO-Qwen2.5-32B-Instruct
AXCXEPT/EZO-Qwen2.5-32B-Instruct is a 32.8 billion parameter instruction-tuned causal language model developed by AXCXEPT, based on Qwen/Qwen2.5-32B-Instruct. It features a 131072 token context length and is specifically optimized for Japanese language tasks, achieving performance approaching gpt-4-turbo on the Japanese MT Bench with 4-bit quantization. The model is designed for global applicability, leveraging high-quality Japanese Wikipedia and FineWeb data for instruction tuning.
Loading preview...
Overview
AXCXEPT/EZO-Qwen2.5-32B-Instruct is an enhanced version of the Qwen/Qwen2.5-32B-Instruct model, developed by AXCXEPT. This 32.8 billion parameter model has undergone multiple tunings to significantly improve its overall performance, particularly excelling in Japanese language tasks. Despite its focus on Japanese data, the model's training approach aims for broad applicability across diverse global needs.
Key Capabilities & Performance
- Japanese Language Proficiency: Achieves inference performance approaching gpt-4-turbo on the Japanese MT Bench when evaluated with gpt-4o and 4-bit quantization.
- Instruction Tuning: Utilizes high-quality instruction data extracted from Japanese Wikipedia and FineWeb.
- Innovative Training: Employs a plain instruction tuning method with exemplary responses to enhance understanding and generation of high-quality content across various languages and contexts.
- Context Length: Supports a substantial context length of 131072 tokens.
Training Details
The model was trained using high-quality instruction data derived from Japanese Wikipedia and FineWeb. An innovative training approach, including pre-instruction training, was implemented to ensure performance improvements across various languages and domains, making it suitable for global use despite its specialized Japanese dataset. The training was conducted on A100 GPUs over 32 hours.