W-61/llama3-8b-base-new-method-q_t-0.4-s_star0.6-beta-next-batch

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 24, 2026Architecture:Transformer Cold

W-61/llama3-8b-base-new-method-q_t-0.4-s_star0.6-beta-next-batch is an 8 billion parameter Llama 3 base model, fine-tuned by W-61. This model is a fine-tuned version of W-61/llama-3-8b-base-sft-ultrachat-8xh200, specifically trained on the HuggingFaceH4/ultrafeedback_binarized dataset. It is designed for general language understanding and generation tasks, leveraging its Llama 3 architecture and targeted fine-tuning for improved performance in conversational contexts.

Loading preview...

Model Overview

This model, llama3-8b-base-new-method-q_t-0.4-s_star0.6-beta-next-batch, is an 8 billion parameter language model developed by W-61. It is a fine-tuned iteration of the W-61/llama-3-8b-base-sft-ultrachat-8xh200 model, specifically enhanced through training on the HuggingFaceH4/ultrafeedback_binarized dataset.

Training Details

The model underwent a single epoch of training with a learning rate of 5e-07. Key training hyperparameters include:

  • Optimizer: ADAMW_TORCH with betas=(0.9, 0.999) and epsilon=1e-08.
  • Batch Size: A train_batch_size of 8 and eval_batch_size of 8, leading to a total_train_batch_size of 128 across 4 multi-GPU devices with 4 gradient accumulation steps.
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.

Intended Use

While specific intended uses and limitations require further information, its fine-tuning on a feedback-binarized dataset suggests potential strengths in:

  • Conversational AI: Generating more aligned and helpful responses in dialogue systems.
  • Instruction Following: Improved ability to adhere to given instructions based on human feedback data.

This model is built upon the robust Llama 3 architecture, making it suitable for a wide range of natural language processing tasks where an 8B parameter model is appropriate.