The ferrazzipietro/Qwen3-8B-reas-int-065-only-loss-noprompt-3epoch-baseline is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-8B architecture. This model was trained for 3 epochs with a cosine learning rate schedule and a total batch size of 64. While specific differentiators and intended uses are not detailed, its fine-tuning suggests optimization for particular tasks, though the dataset remains unknown.
Loading preview...
Model Overview
This model, ferrazzipietro/Qwen3-8B-reas-int-065-only-loss-noprompt-3epoch-baseline, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B base architecture. It has undergone a fine-tuning process over 3 epochs, utilizing a multi-GPU setup with 2 devices and a total training batch size of 64. The training employed an AdamW optimizer with specific beta and epsilon values, and a cosine learning rate scheduler with a 0.1 warmup ratio.
Key Training Details
- Base Model: Qwen/Qwen3-8B
- Parameters: 8 Billion
- Epochs: 3
- Learning Rate: 5e-06
- Optimizer: AdamW_TORCH (betas=(0.9, 0.95), epsilon=1e-12)
- Scheduler: Cosine with 0.1 warmup ratio
- Total Batch Size: 64 (train and eval)
Current Limitations
As per the provided information, details regarding the specific fine-tuning dataset, the model's intended uses, and its limitations are currently not available. This means its primary differentiators or optimized capabilities compared to the base Qwen3-8B model are not explicitly stated. Users should be aware that without further information, its suitability for specific tasks is undefined.