Run Sft__stackexchange-tezos-sandboxes__Kimi-2-5-smaxeps-32k__Qwen3-8B API (Easy Deployment & Flat-Rate Pricing)

Model Overview

This model is a specialized fine-tuned version of the Qwen3-8B architecture, featuring 8 billion parameters and supporting a substantial 32,768 token context length. It was trained using a learning rate of 4e-05 over 7 epochs, with a total batch size of 96 across 32 GPUs.

Key Specialization

The model's unique characteristic is its fine-tuning on the /e/data1/datasets/playground/ot/hf_hub/datasets--penfever--stackexchange-tezos-sandboxes__Kimi-2.5-smaxeps-32k/snapshots/33375d18f3a1d98976944789905e380fce397c46_thinking_preprocessed dataset. This indicates a strong focus on content related to:

Tezos blockchain: Understanding and generating information about the Tezos platform.
StackExchange data: Leveraging the question-and-answer format and technical discussions typical of StackExchange.
Sandbox environments: Potentially adept at handling queries or generating content concerning development and testing environments within the Tezos ecosystem.

Training Details

Base Model: Qwen/Qwen3-8B
Learning Rate: 4e-05
Optimizer: AdamW_Torch_Fused
Epochs: 7.0
Context Length: 32,768 tokens

Potential Use Cases

Given its specific training data, this model is likely well-suited for applications requiring deep knowledge or generation capabilities within the Tezos blockchain domain, particularly for tasks involving technical support, documentation, or Q&A related to Tezos sandboxes.

Overview

Model Overview

Key Specialization

Training Details

Potential Use Cases

Full Model Card (README)