Model Overview
This model, exp-uns-tezos-128unique_glm_4_7_traces_jupiter_cleaned, is a fine-tuned version of the Qwen3-8B architecture, developed by laion. It leverages an 8 billion parameter base model and has been specifically adapted through further training on a unique dataset: /data/cat/ws/befe330h-befe330h-otagent/huggingface/hub/datasets--DCAgent--exp-uns-tezos-128unique_glm_4.7_traces_jupiter_cleaned/snapshots/15d9bb777f344d6d68d8ac555191c073b7c900e7_thinking_preprocessed. This specialized training suggests its potential utility in tasks related to the Tezos blockchain or similar trace data analysis.
Training Details
The fine-tuning process involved several key hyperparameters:
- Learning Rate: 4e-05
- Batch Size: 1 (train), 8 (eval)
- Gradient Accumulation Steps: 2, leading to a total effective training batch size of 16.
- Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.98) and epsilon=1e-08.
- LR Scheduler: Cosine type with a warmup ratio of 0.1.
- Epochs: 7.0
- Distributed Training: Multi-GPU setup using 8 devices.
Framework Versions
The model was trained using:
- Transformers 4.57.6
- Pytorch 2.9.0+cu128
- Datasets 4.4.1
- Tokenizers 0.22.2
Intended Use & Limitations
Specific intended uses and known limitations are not detailed in the provided information. However, given its fine-tuning on Tezos-related trace data, it is likely optimized for tasks within that domain. Further information would be needed to fully assess its capabilities and appropriate applications.