daminzombie/affine-test is a 4.0 billion parameter causal language model, Qwen3-4B-Thinking-2507, developed by Qwen. This model is specifically enhanced for complex reasoning tasks, including logical reasoning, mathematics, science, and coding, and features an extended native context length of 262,144 tokens. It is optimized for "thinking mode" to improve the quality and depth of reasoning, making it suitable for applications requiring advanced problem-solving capabilities.
Loading preview...
Qwen3-4B-Thinking-2507: Enhanced Reasoning Model
Qwen3-4B-Thinking-2507 is a 4.0 billion parameter causal language model from the Qwen3 family, specifically designed to excel in complex reasoning tasks. This iteration significantly improves upon previous versions by enhancing both the quality and depth of its reasoning capabilities across various domains.
Key Capabilities and Enhancements
- Superior Reasoning Performance: Demonstrates marked improvements in logical reasoning, mathematics, science, and coding tasks, often requiring human-level expertise.
- Extended Context Length: Features an impressive native context length of 262,144 tokens, enabling deep understanding and processing of very long inputs.
- Dedicated Thinking Mode: This model operates primarily in a "thinking mode," automatically incorporating internal reasoning processes to tackle highly complex problems. It is recommended for use cases where intricate problem-solving is paramount.
- General Capability Improvements: Shows better instruction following, tool usage, text generation, and alignment with human preferences.
- Agentic Abilities: Excels in tool calling, with recommendations to use Qwen-Agent for optimal agentic performance.
Performance Highlights
The model shows strong performance across various benchmarks, particularly in reasoning and agentic tasks. For instance, it achieves 81.3 on AIME25 and 55.5 on HMMT25, outperforming its predecessors. It also demonstrates competitive results in coding benchmarks like LiveCodeBench and CFEval, and strong alignment scores on IFEval and Creative Writing v3.
Best Practices for Optimal Use
To maximize performance, users are advised to use specific sampling parameters (e.g., Temperature=0.6, TopP=0.95), ensure adequate output length (32,768 tokens for most queries, up to 81,920 for complex problems), and standardize output formats for tasks like math and multiple-choice questions. It is crucial to maintain a context length greater than 131,072 tokens for reasoning-intensive applications to avoid out-of-memory issues while providing sufficient space for the model's internal thought processes.