ferrazzipietro/unsup-Qwen3-1.7B-datav3

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Feb 18, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

ferrazzipietro/unsup-Qwen3-1.7B-datav3 is a 2 billion parameter language model fine-tuned by ferrazzipietro, based on the Qwen3-1.7B architecture. This model was trained for one epoch with a context length of 32768 tokens, achieving a final validation loss of 0.2568. Its specific capabilities and intended uses are not detailed, as it was fine-tuned on an undisclosed dataset.

Loading preview...

Model Overview

ferrazzipietro/unsup-Qwen3-1.7B-datav3 is a 2 billion parameter language model derived from the Qwen3-1.7B architecture. This model was fine-tuned by ferrazzipietro, though the specific dataset used for this process remains undisclosed. It was trained for a single epoch with a substantial context length of 32768 tokens.

Training Details

The training process involved specific hyperparameters:

  • Learning Rate: 0.0003
  • Batch Sizes: train_batch_size of 128, eval_batch_size of 16, and a total_train_batch_size of 512 (with 4 gradient accumulation steps).
  • Optimizer: ADAMW_TORCH with default betas and epsilon.
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
  • Epochs: 1

During training, the model achieved a final validation loss of 0.2568. The training loss decreased from 4.2641 at 1000 steps to 1.8969 at 16000 steps, while the validation loss improved from 0.3924 to 0.2568.

Limitations

As noted in the original model card, more information is needed regarding the model's specific description, intended uses, and limitations. The dataset used for fine-tuning is also not specified, which limits understanding of its specialized capabilities.