endishai/qwen2.5-32b-lexenvs-grpo

TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Apr 20, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The endishai/qwen2.5-32b-lexenvs-grpo model is a 32.8 billion parameter variant of Qwen2.5-32B-Instruct, specialized for credit card optimization reasoning. Developed by endishai, this model utilizes GRPO training to achieve superior performance in financial portfolio selection tasks. It demonstrates an average reward of ~0.51 on a held-out test set, outperforming larger models like Claude Opus 4.6 and GPT-4o in its specific domain. This model is designed for complex financial decision-making support, particularly in credit card strategy.

Loading preview...

Overview

This model, endishai/qwen2.5-32b-lexenvs-grpo, is a specialized 32.8 billion parameter language model based on the Qwen/Qwen2.5-32B-Instruct architecture. It has been fine-tuned using the GRPO (Generalized Reinforcement Learning with Policy Optimization) method to excel specifically in credit card optimization reasoning and financial portfolio selection.

Key Capabilities & Performance

  • Specialized Reasoning: Optimized for complex credit card optimization scenarios.
  • Superior Performance: Achieves an average reward of ~0.51 on a held-out test set of 30 tasks, significantly outperforming:
    • Claude Opus 4.6 (~0.41)
    • Claude Sonnet 4.6 (0.396)
    • GPT-4o (0.363)
    • The base Qwen 32B model (~0.24)
  • Training Details: Trained with GRPO via TRL, utilizing a LoRA adapter (rank 32) on 2x A100-80GB GPUs, using the endishai/lexenvs-tasks dataset.

Intended Use Cases

  • Credit Card Optimization: Ideal for tasks requiring reasoning about credit card rewards, benefits, and spending strategies.
  • Financial Portfolio Selection: Suitable for applications involving the selection and optimization of financial instruments related to credit.

Important Considerations

  • This model is not intended for live consumer financial advice but rather for analytical and reasoning support in financial contexts.
  • A LoRA adapter-only version is also available at endishai/qwen2.5-32b-lexenvs-grpo-lora.