waleko/Qwen3-8B-SFT-envbench_qwen-green-yellow

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 29, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The waleko/Qwen3-8B-SFT-envbench_qwen-green-yellow model is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. This model has been specifically adapted using the envbench_qwen-green-yellow dataset, achieving an accuracy of 0.9472 on its evaluation set. It is designed for tasks aligned with its fine-tuning data, demonstrating strong performance in environments similar to its training regimen. The model processes a context length of 32768 tokens, making it suitable for applications requiring extensive contextual understanding.

Loading preview...

Model Overview

waleko/Qwen3-8B-SFT-envbench_qwen-green-yellow is an 8 billion parameter language model, fine-tuned from the base Qwen/Qwen3-8B architecture. This specific iteration has undergone supervised fine-tuning (SFT) on the envbench_qwen-green-yellow dataset.

Performance Highlights

During its evaluation, the model demonstrated notable performance metrics:

  • Loss: 0.1656
  • Accuracy: 0.9472
  • Input Tokens Seen: 2,242,920

These results indicate its proficiency in tasks related to the envbench_qwen-green-yellow dataset.

Training Details

The fine-tuning process utilized the following key hyperparameters:

  • Learning Rate: 5e-05
  • Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
  • Batch Size: 1 (train), 1 (eval) with 4 gradient accumulation steps, resulting in a total train batch size of 16
  • Epochs: 5.0
  • LR Scheduler: Cosine type with a 0.1 warmup ratio

The model was trained across 4 multi-GPU devices, ensuring efficient processing. It leverages Transformers 4.52.4, Pytorch 2.6.0a0+df5bbc09d1.nv24.12, Datasets 3.6.0, and Tokenizers 0.21.1.

Intended Use Cases

Given its fine-tuning on the envbench_qwen-green-yellow dataset, this model is best suited for applications and tasks that align closely with the characteristics and domain of its training data. Its high accuracy on the evaluation set suggests strong performance in similar environments.