amphora/qwen3-4b-think

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 13, 2026License:otherArchitecture:Transformer Cold

amphora/qwen3-4b-think is a 4 billion parameter language model, fine-tuned from Qwen/Qwen3-4B-Thinking-2507, specifically optimized for reasoning tasks. This model leverages a 32768 token context length and is trained on the combined_reasoning_sft_lt30k dataset, making it suitable for applications requiring advanced logical inference and problem-solving capabilities.

Loading preview...

Overview

amphora/qwen3-4b-think is a 4 billion parameter language model, fine-tuned from the Qwen/Qwen3-4B-Thinking-2507 base model. It is specifically trained on the combined_reasoning_sft_lt30k dataset, indicating an optimization for tasks that require strong reasoning abilities. The model supports a substantial context length of 32768 tokens, allowing it to process and understand longer inputs and generate coherent, contextually relevant outputs.

Key Characteristics

  • Base Model: Fine-tuned from Qwen/Qwen3-4B-Thinking-2507.
  • Parameter Count: 4 billion parameters.
  • Context Length: 32768 tokens, enabling processing of extensive inputs.
  • Training Data: Specialized training on the combined_reasoning_sft_lt30k dataset.

Training Details

The model was trained with a learning rate of 4e-05, using a total batch size of 128 across 8 GPUs. The training procedure involved 3 epochs with a cosine learning rate scheduler and AdamW optimizer. These parameters suggest a focused effort to enhance the model's performance on its target reasoning tasks.

Intended Use Cases

This model is particularly well-suited for applications demanding robust logical reasoning and problem-solving. Its fine-tuning on a reasoning-specific dataset makes it a strong candidate for tasks such as complex question answering, logical inference, and analytical text generation.