amphora/qwen3-8b-base-65k

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 12, 2026License:otherArchitecture:Transformer Cold

amphora/qwen3-8b-base-65k is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B-Base. This model is specifically optimized for reasoning tasks, having been trained on the combined_reasoning_sft_lt65k dataset. It features a context length of 32768 tokens, making it suitable for applications requiring extensive contextual understanding in reasoning-focused scenarios.

Loading preview...

Model Overview

amphora/qwen3-8b-base-65k is an 8 billion parameter language model derived from the Qwen3-8B-Base architecture. It has been fine-tuned on the combined_reasoning_sft_lt65k dataset, indicating a specialization in reasoning tasks. The model supports a substantial context length of 32768 tokens, allowing it to process and understand longer sequences of text.

Key Characteristics

  • Base Model: Fine-tuned from Qwen/Qwen3-8B-Base.
  • Parameter Count: 8 billion parameters.
  • Context Length: Supports up to 32768 tokens.
  • Specialization: Optimized for reasoning tasks through specific dataset training.

Training Details

The model was trained with a learning rate of 4e-05, a total batch size of 128, and utilized the AdamW optimizer. Training spanned 3 epochs with a cosine learning rate scheduler. The framework versions used include Transformers 5.2.0, Pytorch 2.11.0+cu130, Datasets 4.0.0, and Tokenizers 0.22.2.

Intended Use Cases

Given its fine-tuning on a reasoning dataset, this model is best suited for applications that require strong logical inference, problem-solving, and understanding complex relationships within text. Its extended context window further enhances its utility for detailed analytical tasks.