amphora/qwen3-8b-tr
The amphora/qwen3-8b-tr model is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B-Base, specifically optimized for reasoning tasks. It features a 32768 token context length, making it suitable for processing extensive inputs. This model is designed to enhance performance in complex logical and analytical problem-solving scenarios.
Loading preview...
amphora/qwen3-8b-tr: Reasoning-Optimized Qwen3-8B
This model is a specialized fine-tuned version of the Qwen/Qwen3-8B-Base architecture, developed by Qwen. It leverages an 8 billion parameter count and a substantial 32768 token context window, making it capable of handling detailed and lengthy inputs.
Key Capabilities & Training Focus
The primary focus of this model's fine-tuning was on the combined_reasoning_sft_tr dataset. This indicates an optimization for tasks requiring:
- Logical Deduction: Processing information to arrive at conclusions.
- Problem Solving: Addressing complex scenarios through analytical thought.
- Analytical Tasks: Breaking down information and identifying relationships.
Training Details
The model was trained with specific hyperparameters to achieve its specialized performance:
- Learning Rate: 4e-05
- Batch Size: A total effective batch size of 128 (train_batch_size: 4, gradient_accumulation_steps: 4, num_devices: 8)
- Optimizer: ADAMW_TORCH_FUSED
- Scheduler: Cosine learning rate scheduler with 3 epochs.
Intended Use Cases
Given its fine-tuning on reasoning data, amphora/qwen3-8b-tr is particularly well-suited for applications where robust logical processing and analytical capabilities are paramount. This includes tasks such as:
- Complex question answering
- Data analysis and interpretation
- Scientific or technical reasoning
- Educational tools requiring logical problem-solving