amphora/qwen3-8b-tr

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 18, 2026License:otherArchitecture:Transformer Cold

The amphora/qwen3-8b-tr model is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B-Base, specifically optimized for reasoning tasks. It features a 32768 token context length, making it suitable for processing extensive inputs. This model is designed to enhance performance in complex logical and analytical problem-solving scenarios.

Loading preview...

amphora/qwen3-8b-tr: Reasoning-Optimized Qwen3-8B

This model is a specialized fine-tuned version of the Qwen/Qwen3-8B-Base architecture, developed by Qwen. It leverages an 8 billion parameter count and a substantial 32768 token context window, making it capable of handling detailed and lengthy inputs.

Key Capabilities & Training Focus

The primary focus of this model's fine-tuning was on the combined_reasoning_sft_tr dataset. This indicates an optimization for tasks requiring:

  • Logical Deduction: Processing information to arrive at conclusions.
  • Problem Solving: Addressing complex scenarios through analytical thought.
  • Analytical Tasks: Breaking down information and identifying relationships.

Training Details

The model was trained with specific hyperparameters to achieve its specialized performance:

  • Learning Rate: 4e-05
  • Batch Size: A total effective batch size of 128 (train_batch_size: 4, gradient_accumulation_steps: 4, num_devices: 8)
  • Optimizer: ADAMW_TORCH_FUSED
  • Scheduler: Cosine learning rate scheduler with 3 epochs.

Intended Use Cases

Given its fine-tuning on reasoning data, amphora/qwen3-8b-tr is particularly well-suited for applications where robust logical processing and analytical capabilities are paramount. This includes tasks such as:

  • Complex question answering
  • Data analysis and interpretation
  • Scientific or technical reasoning
  • Educational tools requiring logical problem-solving