agi-css/hh-rlhf-sft
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The agi-css/hh-rlhf-sft is a 7 billion parameter supervised fine-tuned language model, developed by agi-css, with a 4096 token context length. It is trained on the 'accepted' options from the Anthropic HH-RLHF dataset, focusing on directly aligning the model with social games rather than using a separate reward model. This model serves as an efficient and stable alternative to traditional RLHF, aiming to improve social alignment.

Loading preview...