W-61/llama3-8b-base-new-method-q_t-0.4-s_star0.6-beta-next-batch
W-61/llama3-8b-base-new-method-q_t-0.4-s_star0.6-beta-next-batch is an 8 billion parameter Llama 3 base model, fine-tuned by W-61. This model is a fine-tuned version of W-61/llama-3-8b-base-sft-ultrachat-8xh200, specifically trained on the HuggingFaceH4/ultrafeedback_binarized dataset. It is designed for general language understanding and generation tasks, leveraging its Llama 3 architecture and targeted fine-tuning for improved performance in conversational contexts.
Loading preview...
Model Overview
This model, llama3-8b-base-new-method-q_t-0.4-s_star0.6-beta-next-batch, is an 8 billion parameter language model developed by W-61. It is a fine-tuned iteration of the W-61/llama-3-8b-base-sft-ultrachat-8xh200 model, specifically enhanced through training on the HuggingFaceH4/ultrafeedback_binarized dataset.
Training Details
The model underwent a single epoch of training with a learning rate of 5e-07. Key training hyperparameters include:
- Optimizer: ADAMW_TORCH with betas=(0.9, 0.999) and epsilon=1e-08.
- Batch Size: A
train_batch_sizeof 8 andeval_batch_sizeof 8, leading to atotal_train_batch_sizeof 128 across 4 multi-GPU devices with 4 gradient accumulation steps. - Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
Intended Use
While specific intended uses and limitations require further information, its fine-tuning on a feedback-binarized dataset suggests potential strengths in:
- Conversational AI: Generating more aligned and helpful responses in dialogue systems.
- Instruction Following: Improved ability to adhere to given instructions based on human feedback data.
This model is built upon the robust Llama 3 architecture, making it suitable for a wide range of natural language processing tasks where an 8B parameter model is appropriate.