amphora/qwen3-4b-think
amphora/qwen3-4b-think is a 4 billion parameter language model, fine-tuned from Qwen/Qwen3-4B-Thinking-2507, specifically optimized for reasoning tasks. This model leverages a 32768 token context length and is trained on the combined_reasoning_sft_lt30k dataset, making it suitable for applications requiring advanced logical inference and problem-solving capabilities.
Loading preview...
Overview
amphora/qwen3-4b-think is a 4 billion parameter language model, fine-tuned from the Qwen/Qwen3-4B-Thinking-2507 base model. It is specifically trained on the combined_reasoning_sft_lt30k dataset, indicating an optimization for tasks that require strong reasoning abilities. The model supports a substantial context length of 32768 tokens, allowing it to process and understand longer inputs and generate coherent, contextually relevant outputs.
Key Characteristics
- Base Model: Fine-tuned from Qwen/Qwen3-4B-Thinking-2507.
- Parameter Count: 4 billion parameters.
- Context Length: 32768 tokens, enabling processing of extensive inputs.
- Training Data: Specialized training on the
combined_reasoning_sft_lt30kdataset.
Training Details
The model was trained with a learning rate of 4e-05, using a total batch size of 128 across 8 GPUs. The training procedure involved 3 epochs with a cosine learning rate scheduler and AdamW optimizer. These parameters suggest a focused effort to enhance the model's performance on its target reasoning tasks.
Intended Use Cases
This model is particularly well-suited for applications demanding robust logical reasoning and problem-solving. Its fine-tuning on a reasoning-specific dataset makes it a strong candidate for tasks such as complex question answering, logical inference, and analytical text generation.