Model Overview
This model is a specialized fine-tuned version of the Qwen3-8B architecture, featuring 8 billion parameters and supporting a substantial 32,768 token context length. It was trained using a learning rate of 4e-05 over 7 epochs, with a total batch size of 96 across 32 GPUs.
Key Specialization
The model's unique characteristic is its fine-tuning on the /e/data1/datasets/playground/ot/hf_hub/datasets--penfever--stackexchange-tezos-sandboxes__Kimi-2.5-smaxeps-32k/snapshots/33375d18f3a1d98976944789905e380fce397c46_thinking_preprocessed dataset. This indicates a strong focus on content related to:
- Tezos blockchain: Understanding and generating information about the Tezos platform.
- StackExchange data: Leveraging the question-and-answer format and technical discussions typical of StackExchange.
- Sandbox environments: Potentially adept at handling queries or generating content concerning development and testing environments within the Tezos ecosystem.
Training Details
- Base Model: Qwen/Qwen3-8B
- Learning Rate: 4e-05
- Optimizer: AdamW_Torch_Fused
- Epochs: 7.0
- Context Length: 32,768 tokens
Potential Use Cases
Given its specific training data, this model is likely well-suited for applications requiring deep knowledge or generation capabilities within the Tezos blockchain domain, particularly for tasks involving technical support, documentation, or Q&A related to Tezos sandboxes.