AXCXEPT/EZO2.5-gemma-3-12b-it-Preview

VISIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kLicense:gemmaArchitecture:Transformer0.0K Cold

AXCXEPT/EZO2.5-gemma-3-12b-it-Preview is a 12 billion parameter instruction-tuned model developed by AXCXEPT, based on Google's Gemma-3 architecture. It leverages a proprietary "EZO" training method, integrating GRPO and PPO concepts, to significantly enhance Japanese language performance on benchmarks like Japanese MT Bench and Elyza Tasks100. This model is optimized for improving base model capabilities with limited data and computational resources, offering a cost-effective alternative to complex reinforcement learning.

Loading preview...

Model Overview

AXCXEPT/EZO2.5-gemma-3-12b-it-Preview is a 12 billion parameter instruction-tuned model built upon Google's Gemma-3 architecture. Developed by AXCXEPT, this model introduces a novel training methodology called "EZO," which integrates concepts from GRPO and PPO to enable autonomous capability improvement in LLMs.

Key Capabilities & Training

  • Enhanced Japanese Performance: The model demonstrates significant improvements in Japanese language tasks, as evidenced by its performance on the Japanese MT Bench and Elyza Tasks100 benchmarks.
  • Efficient Training: Achieved performance gains with a relatively small dataset (3,000 samples) and limited training time (2 hours on 8 H200 GPUs), showcasing the efficiency of the EZO method.
  • Cost-Effective Reinforcement Learning Alternative: The EZO training method is presented as a viable, lower-budget alternative to more complex and time-consuming reinforcement learning approaches like GRPO/PPO.

Performance Highlights

Benchmarking against the base google/gemma-3-12b-it model, AXCXEPT/EZO2.5-gemma-3-12b-it-Preview shows notable performance improvements in Japanese, in some cases approaching the capabilities of larger 32B and 72B models. The developers plan further research, including English benchmarks, to validate the practical utility of the training outcomes.

Intended Use

This model is developed primarily for research purposes. Users should be aware that the training method is still in its research phase, requiring further automation and ablation studies. It is suitable for exploring efficient LLM fine-tuning techniques, particularly for Japanese language tasks under resource constraints.