AXCXEPT/EZO2.5-gemma-3-12b-it-Preview

Cold
Public
Vision
12B
FP8
32768
License: gemma
Hugging Face
Overview

Model Overview

AXCXEPT/EZO2.5-gemma-3-12b-it-Preview is a 12 billion parameter instruction-tuned model built upon Google's Gemma-3 architecture. Developed by AXCXEPT, this model introduces a novel training methodology called "EZO," which integrates concepts from GRPO and PPO to enable autonomous capability improvement in LLMs.

Key Capabilities & Training

  • Enhanced Japanese Performance: The model demonstrates significant improvements in Japanese language tasks, as evidenced by its performance on the Japanese MT Bench and Elyza Tasks100 benchmarks.
  • Efficient Training: Achieved performance gains with a relatively small dataset (3,000 samples) and limited training time (2 hours on 8 H200 GPUs), showcasing the efficiency of the EZO method.
  • Cost-Effective Reinforcement Learning Alternative: The EZO training method is presented as a viable, lower-budget alternative to more complex and time-consuming reinforcement learning approaches like GRPO/PPO.

Performance Highlights

Benchmarking against the base google/gemma-3-12b-it model, AXCXEPT/EZO2.5-gemma-3-12b-it-Preview shows notable performance improvements in Japanese, in some cases approaching the capabilities of larger 32B and 72B models. The developers plan further research, including English benchmarks, to validate the practical utility of the training outcomes.

Intended Use

This model is developed primarily for research purposes. Users should be aware that the training method is still in its research phase, requiring further automation and ablation studies. It is suitable for exploring efficient LLM fine-tuning techniques, particularly for Japanese language tasks under resource constraints.