ferrazzipietro/Qwen3-8B-reas-int-065-only-loss-noprompt-3epoch-baseline
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 5, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The ferrazzipietro/Qwen3-8B-reas-int-065-only-loss-noprompt-3epoch-baseline is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-8B architecture. This model was trained for 3 epochs with a cosine learning rate schedule and a total batch size of 64. While specific differentiators and intended uses are not detailed, its fine-tuning suggests optimization for particular tasks, though the dataset remains unknown.

Loading preview...

Model Overview

This model, ferrazzipietro/Qwen3-8B-reas-int-065-only-loss-noprompt-3epoch-baseline, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B base architecture. It has undergone a fine-tuning process over 3 epochs, utilizing a multi-GPU setup with 2 devices and a total training batch size of 64. The training employed an AdamW optimizer with specific beta and epsilon values, and a cosine learning rate scheduler with a 0.1 warmup ratio.

Key Training Details

  • Base Model: Qwen/Qwen3-8B
  • Parameters: 8 Billion
  • Epochs: 3
  • Learning Rate: 5e-06
  • Optimizer: AdamW_TORCH (betas=(0.9, 0.95), epsilon=1e-12)
  • Scheduler: Cosine with 0.1 warmup ratio
  • Total Batch Size: 64 (train and eval)

Current Limitations

As per the provided information, details regarding the specific fine-tuning dataset, the model's intended uses, and its limitations are currently not available. This means its primary differentiators or optimized capabilities compared to the base Qwen3-8B model are not explicitly stated. Users should be aware that without further information, its suitability for specific tasks is undefined.