ShenaoZhang/0.001_idpo_iter_1
ShenaoZhang/0.001_idpo_iter_1 is a fine-tuned language model based on HuggingFaceH4/mistral-7b-sft-beta, developed by ShenaoZhang. This model was fine-tuned using the HuggingFaceH4/ultrafeedback_binarized dataset. It is designed for tasks benefiting from instruction-following capabilities derived from preference data. The model's specific optimizations and primary use cases require further information for detailed assessment.
Loading preview...
Overview
ShenaoZhang/0.001_idpo_iter_1 is a fine-tuned language model derived from the HuggingFaceH4/mistral-7b-sft-beta base model. It has undergone fine-tuning on the HuggingFaceH4/ultrafeedback_binarized dataset, suggesting an optimization for instruction-following and preference alignment tasks.
Training Details
The model was trained with the following key hyperparameters:
- Learning Rate: 5e-07
- Batch Size: 8 (train and eval)
- Gradient Accumulation Steps: 2, leading to a total effective batch size of 128
- Optimizer: Adam with standard betas and epsilon
- LR Scheduler: Cosine type with a 0.1 warmup ratio
- Epochs: 1
This training configuration indicates a focused fine-tuning process aimed at adapting the base model's behavior to specific instruction-based interactions. Further details on its intended uses, limitations, and performance metrics are currently not available in the provided documentation.