Gueule-d-ange/qwen1.5b-sft-1k
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kLicense:apache-2.0Architecture:Transformer Open Weights Warm

Gueule-d-ange/qwen1.5b-sft-1k is a 1.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-1.5B. This model was trained with a learning rate of 2e-05 over 1 epoch using a total batch size of 128. Specific details regarding its primary differentiators, intended uses, and training data are currently unspecified.

Loading preview...

Model Overview

Gueule-d-ange/qwen1.5b-sft-1k is a fine-tuned variant of the Qwen/Qwen2.5-1.5B base model. This 1.5 billion parameter model was subjected to a single epoch of supervised fine-tuning.

Training Details

The training process utilized a learning rate of 2e-05 with an AdamW optimizer. Key hyperparameters included a total training batch size of 128 (achieved with a train_batch_size of 1 and gradient_accumulation_steps of 16) and mixed-precision training. The training was conducted on 8 devices, employing a cosine learning rate scheduler with a warmup ratio of 0.03.

Current Status

As of now, specific information regarding the dataset used for fine-tuning, the model's intended uses, limitations, and evaluation data is not publicly available. Users should exercise caution and conduct their own evaluations to determine suitability for specific applications.