Name: endishai/qwen2.5-32b-lexenvs-grpo API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: endishai

Overview

This model, endishai/qwen2.5-32b-lexenvs-grpo, is a specialized 32.8 billion parameter language model based on the Qwen/Qwen2.5-32B-Instruct architecture. It has been fine-tuned using the GRPO (Generalized Reinforcement Learning with Policy Optimization) method to excel specifically in credit card optimization reasoning and financial portfolio selection.

Key Capabilities & Performance

Specialized Reasoning: Optimized for complex credit card optimization scenarios.
Superior Performance: Achieves an average reward of ~0.51 on a held-out test set of 30 tasks, significantly outperforming:
- Claude Opus 4.6 (~0.41)
- Claude Sonnet 4.6 (0.396)
- GPT-4o (0.363)
- The base Qwen 32B model (~0.24)
Training Details: Trained with GRPO via TRL, utilizing a LoRA adapter (rank 32) on 2x A100-80GB GPUs, using the endishai/lexenvs-tasks dataset.

Intended Use Cases

Credit Card Optimization: Ideal for tasks requiring reasoning about credit card rewards, benefits, and spending strategies.
Financial Portfolio Selection: Suitable for applications involving the selection and optimization of financial instruments related to credit.

Important Considerations

This model is not intended for live consumer financial advice but rather for analytical and reasoning support in financial contexts.
A LoRA adapter-only version is also available at endishai/qwen2.5-32b-lexenvs-grpo-lora.

Overview

Overview

Key Capabilities & Performance

Intended Use Cases

Important Considerations

Full Model Card (README)