Gandalf1/qwen3-8b-finance-sft-phase2
Gandalf1/qwen3-8b-finance-sft-phase2 is an 8.2 billion parameter causal language model from the Qwen3 series, developed by Qwen. It features a unique dual-mode operation, seamlessly switching between a 'thinking mode' for complex reasoning, math, and coding, and a 'non-thinking mode' for general dialogue. This model excels in human preference alignment, agent capabilities, and multilingual instruction following across over 100 languages, with a native context length of 32,768 tokens, extendable to 131,072 tokens via YaRN.
Loading preview...
Qwen3-8B: Dual-Mode LLM for Enhanced Reasoning and Multilingual Support
Qwen3-8B is an 8.2 billion parameter causal language model from the Qwen series, designed for both pretraining and post-training stages. A key differentiator is its unique ability to switch between a 'thinking mode' and a 'non-thinking mode' within a single model. The thinking mode is optimized for complex logical reasoning, mathematics, and coding tasks, while the non-thinking mode handles efficient, general-purpose dialogue.
Key Capabilities
- Enhanced Reasoning: Significantly improves performance in mathematics, code generation, and commonsense logical reasoning compared to previous Qwen models.
- Human Preference Alignment: Excels in creative writing, role-playing, multi-turn dialogues, and instruction following, providing a more natural conversational experience.
- Agentic Capabilities: Demonstrates strong tool-calling abilities, achieving leading performance among open-source models in complex agent-based tasks, especially when integrated with Qwen-Agent.
- Multilingual Support: Supports over 100 languages and dialects with robust multilingual instruction following and translation capabilities.
- Extended Context Window: Natively handles 32,768 tokens, with support for up to 131,072 tokens using the YaRN method for long text processing.
When to Use This Model
- Complex Problem Solving: Ideal for applications requiring deep reasoning, such as mathematical problem-solving or code generation, by leveraging its 'thinking mode'.
- Interactive Agents: Suitable for building sophisticated AI agents that require precise integration with external tools and complex task execution.
- Multilingual Applications: Excellent choice for global applications needing strong performance across a wide array of languages and dialects.
- Long Document Analysis: Beneficial for tasks involving extensive text, such as summarizing long articles or processing lengthy conversations, due to its extended context window.