yakazimir/simpo-exps_qwen05b
The yakazimir/simpo-exps_qwen05b is a 0.6 billion parameter language model, fine-tuned from trl-lib/qwen1.5-0.5b-sft. It was trained with a learning rate of 8e-08 and a cosine learning rate scheduler over 60 steps. This model's primary differentiator and specific use cases are not detailed in the provided information, but it demonstrates a rewards accuracy of 0.5230 on its evaluation set.
Loading preview...
Overview
The yakazimir/simpo-exps_qwen05b model is a fine-tuned variant of the trl-lib/qwen1.5-0.5b-sft base model, featuring approximately 0.6 billion parameters. It underwent a specific training regimen, although the dataset used for fine-tuning is not specified. The training process involved a learning rate of 8e-08, a batch size of 1, and 16 gradient accumulation steps, totaling 60 training steps.
Key Capabilities
- Fine-tuned Base Model: Built upon the
trl-lib/qwen1.5-0.5b-sftarchitecture. - Evaluation Metrics: Achieved a loss of 0.7797 and a rewards accuracy of 0.5230 on its evaluation set, with a rewards margin of 0.0860.
- Training Configuration: Utilized an Adam optimizer with specific beta and epsilon values, and a cosine learning rate scheduler with a 0.1 warmup ratio.
Good for
- Research into Fine-tuning: Potentially useful for researchers studying the effects of specific fine-tuning parameters on Qwen1.5-0.5B models, given the detailed training hyperparameters.
- Baseline Comparisons: Can serve as a baseline for comparing performance against other fine-tuned models in the 0.6B parameter class, particularly when evaluating reward-based metrics.
Further details regarding its intended uses, limitations, and the specific training and evaluation data are not provided in the current model description.