ferrazzipietro/review-Qwen3-8B-reas-int-065-only-loss-noprompt-3epoch

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 16, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The ferrazzipietro/review-Qwen3-8B-reas-int-065-only-loss-noprompt-3epoch model is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-8B architecture. This model was trained for 3 epochs using specific hyperparameters including a learning rate of 5e-06 and a cosine learning rate scheduler. While its primary differentiator and specific use cases are not detailed, it represents a fine-tuned iteration of the Qwen3-8B base model.

Loading preview...

Model Overview

This model, ferrazzipietro/review-Qwen3-8B-reas-int-065-only-loss-noprompt-3epoch, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has undergone a fine-tuning process over 3 epochs, utilizing a multi-GPU setup with 2 devices.

Training Details

The fine-tuning procedure involved specific hyperparameters:

  • Learning Rate: 5e-06
  • Optimizer: ADAMW_TORCH with betas=(0.9, 0.95) and epsilon=1e-12
  • Batch Size: A total training batch size of 64 (train_batch_size: 4, gradient_accumulation_steps: 8)
  • Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1
  • Epochs: 3

The training was conducted using Transformers 4.57.0, Pytorch 2.10.0+cu128, Datasets 3.6.0, and Tokenizers 0.22.2.

Current Limitations

The README indicates that more information is needed regarding the model's specific description, intended uses, limitations, and the dataset used for training and evaluation. Developers should consider these unknowns when evaluating its suitability for particular applications.