amphora/qwen3-8b-base-65k
amphora/qwen3-8b-base-65k is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B-Base. This model is specifically optimized for reasoning tasks, having been trained on the combined_reasoning_sft_lt65k dataset. It features a context length of 32768 tokens, making it suitable for applications requiring extensive contextual understanding in reasoning-focused scenarios.
Loading preview...
Model Overview
amphora/qwen3-8b-base-65k is an 8 billion parameter language model derived from the Qwen3-8B-Base architecture. It has been fine-tuned on the combined_reasoning_sft_lt65k dataset, indicating a specialization in reasoning tasks. The model supports a substantial context length of 32768 tokens, allowing it to process and understand longer sequences of text.
Key Characteristics
- Base Model: Fine-tuned from Qwen/Qwen3-8B-Base.
- Parameter Count: 8 billion parameters.
- Context Length: Supports up to 32768 tokens.
- Specialization: Optimized for reasoning tasks through specific dataset training.
Training Details
The model was trained with a learning rate of 4e-05, a total batch size of 128, and utilized the AdamW optimizer. Training spanned 3 epochs with a cosine learning rate scheduler. The framework versions used include Transformers 5.2.0, Pytorch 2.11.0+cu130, Datasets 4.0.0, and Tokenizers 0.22.2.
Intended Use Cases
Given its fine-tuning on a reasoning dataset, this model is best suited for applications that require strong logical inference, problem-solving, and understanding complex relationships within text. Its extended context window further enhances its utility for detailed analytical tasks.