amphora/qwen3-8b-base-30k
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 9, 2026License:otherArchitecture:Transformer Cold

The amphora/qwen3-8b-base-30k model is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B-Base. This model has been specifically trained on the combined_reasoning_sft_lt30k dataset, indicating an optimization for reasoning tasks. It features a substantial context length of 32768 tokens, making it suitable for processing extensive inputs and complex logical sequences.

Loading preview...

Model Overview

This model, amphora/qwen3-8b-base-30k, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B-Base architecture. It has been fine-tuned on the combined_reasoning_sft_lt30k dataset, suggesting a specialization in tasks requiring strong reasoning capabilities.

Key Characteristics

  • Base Model: Fine-tuned from Qwen/Qwen3-8B-Base.
  • Parameter Count: 8 billion parameters.
  • Context Length: Supports a substantial context window of 32768 tokens, enabling the processing of long and complex inputs.
  • Training Data: Fine-tuned on the combined_reasoning_sft_lt30k dataset, indicating a focus on reasoning-oriented tasks.

Training Details

The model was trained with a learning rate of 4e-05, a batch size of 8 (total train batch size of 128 with gradient accumulation), and utilized the ADAMW_TORCH_FUSED optimizer. Training spanned 3 epochs with a cosine learning rate scheduler. The training environment included Transformers 5.2.0, Pytorch 2.11.0+cu130, Datasets 4.0.0, and Tokenizers 0.22.2.

Potential Use Cases

Given its fine-tuning on a reasoning-focused dataset and large context window, this model is likely well-suited for applications requiring:

  • Complex problem-solving.
  • Logical inference and deduction.
  • Understanding and generating coherent, extended narratives or arguments.