Name: ferrazzipietro/review-Qwen3-8B-reas-int-065-only-loss-noprompt-3epoch API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ferrazzipietro

Model Overview

This model, ferrazzipietro/review-Qwen3-8B-reas-int-065-only-loss-noprompt-3epoch, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has undergone a fine-tuning process over 3 epochs, utilizing a multi-GPU setup with 2 devices.

Training Details

The fine-tuning procedure involved specific hyperparameters:

Learning Rate: 5e-06
Optimizer: ADAMW_TORCH with betas=(0.9, 0.95) and epsilon=1e-12
Batch Size: A total training batch size of 64 (train_batch_size: 4, gradient_accumulation_steps: 8)
Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1
Epochs: 3

The training was conducted using Transformers 4.57.0, Pytorch 2.10.0+cu128, Datasets 3.6.0, and Tokenizers 0.22.2.

Current Limitations

The README indicates that more information is needed regarding the model's specific description, intended uses, limitations, and the dataset used for training and evaluation. Developers should consider these unknowns when evaluating its suitability for particular applications.

Overview

Model Overview

Training Details

Current Limitations

Full Model Card (README)