W-61/llama3-8b-base-new-method-s_star0.6-20260426-230653
W-61/llama3-8b-base-new-method-s_star0.6-20260426-230653 is an 8 billion parameter language model fine-tuned by W-61. It is based on W-61/llama-3-8b-base-sft-ultrachat-8xh200 and further trained on the HuggingFaceH4/ultrafeedback_binarized dataset. This model is optimized for tasks requiring nuanced understanding and generation based on preference data, demonstrating improved loss and DPO metrics on its evaluation set. It is suitable for applications benefiting from instruction-following and preference alignment.
Loading preview...
Model Overview
W-61/llama3-8b-base-new-method-s_star0.6-20260426-230653 is an 8 billion parameter language model developed by W-61. It is a fine-tuned iteration of the W-61/llama-3-8b-base-sft-ultrachat-8xh200 base model, specifically trained using the HuggingFaceH4/ultrafeedback_binarized dataset.
Key Training Details
This model underwent a single epoch of training with a learning rate of 5e-07 and a total batch size of 128. The training process utilized a multi-GPU setup with 4 devices and an AdamW optimizer. Key evaluation metrics from the training results include:
- Loss: 0.5352
- Fcm Dpo/beta: 0.0111
- Margin Dpo/margin Mean: 54.0836
- Logps/chosen: -383.4114
- Logps/rejected: -416.5982
Intended Use Cases
While specific intended uses are not detailed, the fine-tuning on a preference dataset like ultrafeedback_binarized suggests this model is well-suited for tasks that benefit from alignment with human preferences, such as:
- Instruction following
- Response generation where quality is judged by human feedback
- Applications requiring nuanced understanding of preferred outputs