Name: amphora/qwen3-8b-base-65k API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: amphora

Model Overview

amphora/qwen3-8b-base-65k is an 8 billion parameter language model derived from the Qwen3-8B-Base architecture. It has been fine-tuned on the combined_reasoning_sft_lt65k dataset, indicating a specialization in reasoning tasks. The model supports a substantial context length of 32768 tokens, allowing it to process and understand longer sequences of text.

Key Characteristics

Base Model: Fine-tuned from Qwen/Qwen3-8B-Base.
Parameter Count: 8 billion parameters.
Context Length: Supports up to 32768 tokens.
Specialization: Optimized for reasoning tasks through specific dataset training.

Training Details

The model was trained with a learning rate of 4e-05, a total batch size of 128, and utilized the AdamW optimizer. Training spanned 3 epochs with a cosine learning rate scheduler. The framework versions used include Transformers 5.2.0, Pytorch 2.11.0+cu130, Datasets 4.0.0, and Tokenizers 0.22.2.

Intended Use Cases

Given its fine-tuning on a reasoning dataset, this model is best suited for applications that require strong logical inference, problem-solving, and understanding complex relationships within text. Its extended context window further enhances its utility for detailed analytical tasks.

Overview

Model Overview

Key Characteristics

Training Details

Intended Use Cases

Full Model Card (README)