elyza/ELYZA-Shortcut-1.0-Qwen-7B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 30, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

ELYZA-Shortcut-1.0-Qwen-7B is a 7.6 billion parameter language model developed by elyza, based on Qwen/Qwen2.5-7B-Instruct, with a 131072 token context length. This model is specifically post-trained to bypass step-by-step reasoning, directly generating final answers from problem-solution pairs. It is optimized for direct answer generation in scenarios where explicit reasoning steps are not required, making it suitable for efficient, non-reasoning task execution.

Loading preview...

Overview

ELYZA-Shortcut-1.0-Qwen-7B is a 7.6 billion parameter language model developed by elyza, built upon the Qwen/Qwen2.5-7B-Instruct architecture. Unlike traditional reasoning models, this model is specifically designed to directly generate final answers without performing explicit step-by-step reasoning. It was developed as a non-reasoning counterpart to the ELYZA-Thinking-1.0-Qwen-32B reasoning model.

Key Capabilities

  • Direct Answer Generation: The model is post-trained via supervised fine-tuning (SFT) on problem-solution pairs, where reasoning steps have been removed. This allows it to directly output answers.
  • Efficiency: By bypassing intermediate reasoning steps, the model aims for more direct and potentially faster answer generation for suitable tasks.
  • Qwen Foundation: Leverages the robust base capabilities of the Qwen2.5-7B-Instruct model.

Training Methodology

The post-training involved SFT using data derived from optimal reasoning paths. These paths were initially explored using a Monte Carlo Tree Search (MCTS) based algorithm, and then the reasoning steps were extracted to create direct problem-solution pairs for fine-tuning.

Use Cases

This model is particularly suited for applications where a direct, concise answer is preferred over a detailed explanation or step-by-step reasoning process. It can be integrated using the Hugging Face Transformers library for inference or deployed with vLLM for an OpenAI-compatible server.