ericflo/Qwen2.5-7B-Think-KTO-v0.2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jan 29, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

The ericflo/Qwen2.5-7B-Think-KTO-v0.2 is a 7.6 billion parameter language model built upon the Qwen2.5-7B architecture, featuring a 131072 token context length. It is specifically enhanced for reasoning tasks through a two-stage training process combining Supervised Fine-Tuning (SFT) and Kahneman-Tversky Optimization (KTO). This model excels at generating responses with an explicit thought process, making it suitable for applications requiring transparent, human-like reasoning and problem-solving.

Loading preview...

Qwen2.5-Think-KTO v0.2: Reasoning-Enhanced Language Model

This model, developed by Eric Florenzano, is a 7.6 billion parameter variant of the Qwen2.5-7B base model, specifically engineered to improve reasoning capabilities. It utilizes a unique two-stage training approach: initial Supervised Fine-Tuning (SFT) with expert demonstrations, followed by Kahneman-Tversky Optimization (KTO) using binary feedback signals. This process aims to provide more robust and consistent reasoning.

Key Capabilities

  • Explicit Thought Process: Generates responses in a <think>...</think> then answer format, making its reasoning transparent.
  • Enhanced Reliability: Shows improved consistency in generating reasoning tags compared to its predecessor.
  • Two-Stage Training: Leverages SFT and KTO on an expanded dataset of nearly 500 datapoints for refined reasoning.
  • Optimized for Reasoning: Designed to mimic natural thought processes and benefit from human-like reasoning.

What's It Good For?

  • Tasks requiring a clear, step-by-step thought process.
  • Scenarios where binary feedback (desirable/undesirable outputs) can be leveraged.
  • Problems that benefit from human-like, transparent reasoning.
  • Applications needing a distinct progression from thought to final answer.

Limitations

Performance on non-reasoning tasks remains consistent with the base Qwen2.5-7B model, and its generalization beyond the training distribution may be limited. Consistency is still an area for potential improvement.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p