Run exp-syh-tezos-stackoverflow-mixed_glm_4_7_traces_jupiter_cleaned API | Serverless Inference | 32K Context

Model Overview

This model, exp-syh-tezos-stackoverflow-mixed_glm_4_7_traces_jupiter_cleaned, is an 8 billion parameter language model built upon the Qwen/Qwen3-8B architecture. It has undergone fine-tuning on a specialized dataset, /data/cat/ws/befe330h-befe330h-otagent/huggingface/hub/datasets--DCAgent--exp-syh-tezos-stackoverflow-mixed_glm_4.7_traces_jupiter_cleaned/snapshots/d44d8f79145236ce933ae10c8e21ee822ff82165_thinking_preprocessed.

Key Characteristics

Base Model: Fine-tuned from Qwen/Qwen3-8B, a robust foundation model.
Specialized Fine-tuning: The training data, derived from Tezos and Stack Overflow traces, indicates a focus on technical content, potentially related to blockchain technology and programming Q&A.
Parameter Count: Features 8 billion parameters, offering a balance between capability and computational efficiency.

Training Details

The model was trained with the following hyperparameters:

Learning Rate: 4e-05
Batch Size: A total train batch size of 16 (1 per device, 8 devices, 2 gradient accumulation steps).
Epochs: Trained for 7.0 epochs.
Optimizer: Utilized ADAMW_TORCH_FUSED with specific beta and epsilon values.
Scheduler: Employs a cosine learning rate scheduler with a 0.1 warmup ratio.

Potential Use Cases

Given its specialized fine-tuning, this model is likely well-suited for applications requiring an understanding of:

Blockchain-related content: Particularly within the Tezos ecosystem.
Technical Q&A: Generating or understanding responses similar to those found on Stack Overflow.
Code-related text: Processing and generating programming-centric information.