The amphora/qwen3-8b-base-30k model is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B-Base. This model has been specifically trained on the combined_reasoning_sft_lt30k dataset, indicating an optimization for reasoning tasks. It features a substantial context length of 32768 tokens, making it suitable for processing extensive inputs and complex logical sequences.
Loading preview...
Model Overview
This model, amphora/qwen3-8b-base-30k, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B-Base architecture. It has been fine-tuned on the combined_reasoning_sft_lt30k dataset, suggesting a specialization in tasks requiring strong reasoning capabilities.
Key Characteristics
- Base Model: Fine-tuned from Qwen/Qwen3-8B-Base.
- Parameter Count: 8 billion parameters.
- Context Length: Supports a substantial context window of 32768 tokens, enabling the processing of long and complex inputs.
- Training Data: Fine-tuned on the
combined_reasoning_sft_lt30kdataset, indicating a focus on reasoning-oriented tasks.
Training Details
The model was trained with a learning rate of 4e-05, a batch size of 8 (total train batch size of 128 with gradient accumulation), and utilized the ADAMW_TORCH_FUSED optimizer. Training spanned 3 epochs with a cosine learning rate scheduler. The training environment included Transformers 5.2.0, Pytorch 2.11.0+cu130, Datasets 4.0.0, and Tokenizers 0.22.2.
Potential Use Cases
Given its fine-tuning on a reasoning-focused dataset and large context window, this model is likely well-suited for applications requiring:
- Complex problem-solving.
- Logical inference and deduction.
- Understanding and generating coherent, extended narratives or arguments.