laion/stackexchange-tezos-sandboxes_glm_4_6_traces_together

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 19, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

This is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B by laion, specifically on the DCAgent/stackexchange-tezos-sandboxes_glm_4.6_traces_together dataset. With a 32768 token context length, this model is specialized for tasks related to the Tezos blockchain and StackExchange content, likely excelling in generating or understanding technical discussions within that domain.

Loading preview...

Model Overview

This model is a fine-tuned version of the Qwen3-8B architecture, developed by laion. It has been specifically adapted using the DCAgent/stackexchange-tezos-sandboxes_glm_4.6_traces_together dataset, indicating a specialization in content related to the Tezos blockchain and StackExchange discussions.

Training Details

The model was trained with a learning rate of 4e-05, a batch size of 1 per device across 8 GPUs, and a total effective batch size of 16 due to gradient accumulation. It utilized the AdamW_TORCH_FUSED optimizer with a cosine learning rate scheduler over 7 epochs. The training environment included Transformers 4.56.1 and Pytorch 2.9.0+cu128.

Potential Use Cases

  • Generating responses or summaries for technical questions on Tezos-related StackExchange forums.
  • Analyzing and extracting information from discussions about Tezos sandboxes and GLM traces.
  • Assisting developers with queries specific to the Tezos ecosystem based on community knowledge.