laion/exp-syh-tezos-stackoverflow-mixed_glm_4_7_traces_jupiter_cleaned

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Feb 27, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The laion/exp-syh-tezos-stackoverflow-mixed_glm_4_7_traces_jupiter_cleaned model is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. This model is specifically adapted using a dataset derived from Tezos and Stack Overflow traces, suggesting an optimization for understanding and generating content related to blockchain development and technical Q&A. Its fine-tuning on specialized data aims to enhance performance in domain-specific contexts, particularly for technical information retrieval and generation.

Loading preview...

Model Overview

This model, exp-syh-tezos-stackoverflow-mixed_glm_4_7_traces_jupiter_cleaned, is an 8 billion parameter language model built upon the Qwen/Qwen3-8B architecture. It has undergone fine-tuning on a specialized dataset, /data/cat/ws/befe330h-befe330h-otagent/huggingface/hub/datasets--DCAgent--exp-syh-tezos-stackoverflow-mixed_glm_4.7_traces_jupiter_cleaned/snapshots/d44d8f79145236ce933ae10c8e21ee822ff82165_thinking_preprocessed.

Key Characteristics

  • Base Model: Fine-tuned from Qwen/Qwen3-8B, a robust foundation model.
  • Specialized Fine-tuning: The training data, derived from Tezos and Stack Overflow traces, indicates a focus on technical content, potentially related to blockchain technology and programming Q&A.
  • Parameter Count: Features 8 billion parameters, offering a balance between capability and computational efficiency.

Training Details

The model was trained with the following hyperparameters:

  • Learning Rate: 4e-05
  • Batch Size: A total train batch size of 16 (1 per device, 8 devices, 2 gradient accumulation steps).
  • Epochs: Trained for 7.0 epochs.
  • Optimizer: Utilized ADAMW_TORCH_FUSED with specific beta and epsilon values.
  • Scheduler: Employs a cosine learning rate scheduler with a 0.1 warmup ratio.

Potential Use Cases

Given its specialized fine-tuning, this model is likely well-suited for applications requiring an understanding of:

  • Blockchain-related content: Particularly within the Tezos ecosystem.
  • Technical Q&A: Generating or understanding responses similar to those found on Stack Overflow.
  • Code-related text: Processing and generating programming-centric information.