kaist-ai/janus-orpo-7b: Personalized Response Generation
Janus-ORPO-7B is a 7 billion parameter language model developed by KAIST AI, built upon the Mistral-7B-v0.2 base model. Its core innovation lies in its training methodology, utilizing ORPO (Odds Ratio Preference Optimization) on the Multifaceted Collection dataset, which comprises 196,000 unique system messages. This extensive training enables Janus to generalize across various system messages, allowing for fine-grained control over response generation.
Key Capabilities
- Personalized Response Generation: Excels at tailoring outputs to diverse human preferences specified via system messages.
- Helpful and Harmless Alignment: Designed to produce responses that are generally preferred for their helpfulness and lack of harmful content.
- System Message Control: Users can influence the model's behavior and output style by providing specific system messages in the input prompt.
Training and Usage
The model was trained using the Multifaceted-Collection-ORPO dataset, focusing on aligning LLMs to a broad spectrum of human preferences. It employs a standard [INST]{system_message}\n{instruction}[/INST] prompt format for inference. For further details on training and evaluation, including the Multifaceted Bench for personalized response assessment, refer to the GitHub Repository and the research paper.