ferrazzipietro/unsup-Qwen3-1.7B-datav3
ferrazzipietro/unsup-Qwen3-1.7B-datav3 is a 2 billion parameter language model fine-tuned by ferrazzipietro, based on the Qwen3-1.7B architecture. This model was trained for one epoch with a context length of 32768 tokens, achieving a final validation loss of 0.2568. Its specific capabilities and intended uses are not detailed, as it was fine-tuned on an undisclosed dataset.
Loading preview...
Model Overview
ferrazzipietro/unsup-Qwen3-1.7B-datav3 is a 2 billion parameter language model derived from the Qwen3-1.7B architecture. This model was fine-tuned by ferrazzipietro, though the specific dataset used for this process remains undisclosed. It was trained for a single epoch with a substantial context length of 32768 tokens.
Training Details
The training process involved specific hyperparameters:
- Learning Rate: 0.0003
- Batch Sizes:
train_batch_sizeof 128,eval_batch_sizeof 16, and atotal_train_batch_sizeof 512 (with 4 gradient accumulation steps). - Optimizer: ADAMW_TORCH with default betas and epsilon.
- Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
- Epochs: 1
During training, the model achieved a final validation loss of 0.2568. The training loss decreased from 4.2641 at 1000 steps to 1.8969 at 16000 steps, while the validation loss improved from 0.3924 to 0.2568.
Limitations
As noted in the original model card, more information is needed regarding the model's specific description, intended uses, and limitations. The dataset used for fine-tuning is also not specified, which limits understanding of its specialized capabilities.