notbdq/Qwen2.5-14B-Instruct-1M-GRPO-Reasoning
TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Feb 1, 2025Architecture:Transformer0.0K Warm

The notbdq/Qwen2.5-14B-Instruct-1M-GRPO-Reasoning model is a fine-tuned variant of the Qwen2.5-14B-Instruct-1M architecture, developed by notbdq. This model specifically applies the GRPO technique using the Numina CoT dataset, enhancing its reasoning capabilities. It is optimized for complex problem-solving, particularly in mathematical and logical tasks, by generating an explicit reasoning process before providing an answer. This specialization makes it suitable for applications requiring structured thought processes and detailed explanations.

Loading preview...

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p