ShenaoZhang/0.001_idpo_iter_1

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:Apr 5, 2024License:mitArchitecture:Transformer Open Weights Cold

ShenaoZhang/0.001_idpo_iter_1 is a fine-tuned language model based on HuggingFaceH4/mistral-7b-sft-beta, developed by ShenaoZhang. This model was fine-tuned using the HuggingFaceH4/ultrafeedback_binarized dataset. It is designed for tasks benefiting from instruction-following capabilities derived from preference data. The model's specific optimizations and primary use cases require further information for detailed assessment.

Loading preview...

Overview

ShenaoZhang/0.001_idpo_iter_1 is a fine-tuned language model derived from the HuggingFaceH4/mistral-7b-sft-beta base model. It has undergone fine-tuning on the HuggingFaceH4/ultrafeedback_binarized dataset, suggesting an optimization for instruction-following and preference alignment tasks.

Training Details

The model was trained with the following key hyperparameters:

  • Learning Rate: 5e-07
  • Batch Size: 8 (train and eval)
  • Gradient Accumulation Steps: 2, leading to a total effective batch size of 128
  • Optimizer: Adam with standard betas and epsilon
  • LR Scheduler: Cosine type with a 0.1 warmup ratio
  • Epochs: 1

This training configuration indicates a focused fine-tuning process aimed at adapting the base model's behavior to specific instruction-based interactions. Further details on its intended uses, limitations, and performance metrics are currently not available in the provided documentation.