W-61/llama-3-8b-base-hh-harmless-sft-4xh100
W-61/llama-3-8b-base-hh-harmless-sft-4xh100 is an 8 billion parameter language model fine-tuned from Meta-Llama-3-8B. It has been specifically trained on the Anthropic/hh-rlhf dataset to enhance harmlessness and align with human preferences. This model is optimized for applications requiring a robust, safety-focused conversational AI with an 8192 token context length.
Loading preview...
Model Overview
This model, W-61/llama-3-8b-base-hh-harmless-sft-4xh100, is an 8 billion parameter language model derived from the meta-llama/Meta-Llama-3-8B base model. Its primary distinction lies in its fine-tuning process, which utilized the Anthropic/hh-rlhf dataset. This training approach is typically employed to improve model harmlessness and reduce undesirable outputs, aligning the model's responses more closely with human safety guidelines.
Key Training Details
- Base Model:
meta-llama/Meta-Llama-3-8B - Fine-tuning Dataset: Anthropic/hh-rlhf
- Training Hyperparameters:
- Learning Rate: 2e-05
- Batch Size (train/eval): 8
- Gradient Accumulation Steps: 4
- Total Train Batch Size: 128
- Optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- Epochs: 1
Intended Use Cases
Given its fine-tuning on a harmlessness dataset, this model is particularly suited for applications where safety and reduced toxicity are critical. It can be considered for conversational agents, content moderation, or any scenario requiring a language model that adheres to strict safety protocols.