waleko/Qwen3-8B-SFT-envbench_qwen-all

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 29, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The waleko/Qwen3-8B-SFT-envbench_qwen-all is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. This model was specifically trained on the envbench_qwen-all dataset, achieving a loss of 0.1477 and an accuracy of 0.9511 on its evaluation set. It is designed for tasks aligned with its specialized training data, offering a 32768 token context length.

Loading preview...

Model Overview

This model, waleko/Qwen3-8B-SFT-envbench_qwen-all, is a specialized fine-tuned version of the Qwen/Qwen3-8B base model. It features 8 billion parameters and supports a substantial context length of 32768 tokens, making it suitable for processing longer inputs.

Training Details

The model was fine-tuned using the envbench_qwen-all dataset. During its evaluation, it demonstrated strong performance metrics:

  • Loss: 0.1477
  • Accuracy: 0.9511
  • Num Input Tokens Seen: 36,600,520

Training was conducted with a learning rate of 5e-05, a total batch size of 16 (achieved with gradient accumulation), and utilized a cosine learning rate scheduler with a 0.1 warmup ratio over 5 epochs. The training environment included Transformers 4.52.4 and PyTorch 2.6.0a0.

Intended Use

Given its specific fine-tuning on the envbench_qwen-all dataset, this model is best suited for applications and tasks that align with the characteristics and content of that particular dataset. Users should consider the nature of the training data when determining its applicability for their specific use cases.