AmberYifan/Qwen2-7B-sft-ultrachat-safeRLHF
AmberYifan/Qwen2-7B-sft-ultrachat-safeRLHF is a 7.6 billion parameter Qwen2-based language model fine-tuned by AmberYifan. This model is a safety-aligned instruction-tuned variant, building upon a prior SFT on Ultrachat data. It is designed for general conversational AI applications where safety and adherence to instructions are prioritized.
Loading preview...
Model Overview
AmberYifan/Qwen2-7B-sft-ultrachat-safeRLHF is a 7.6 billion parameter language model derived from the Qwen2 architecture. It represents a further fine-tuning of the AmberYifan/Qwen2-7B-sft-ultrachat model, specifically incorporating safety alignment through a process that likely involves Reinforcement Learning from Human Feedback (RLHF), as indicated by "safeRLHF" in its name. The initial SFT (Supervised Fine-Tuning) was performed on the Ultrachat dataset.
Key Capabilities
- Instruction Following: Designed to respond accurately and appropriately to user instructions.
- Safety Alignment: Incorporates safety measures to reduce harmful or undesirable outputs.
- Conversational AI: Suitable for general-purpose dialogue and question-answering tasks.
Training Details
This model was trained using the TRL (Transformer Reinforcement Learning) framework, version 0.12.2. The training process involved Supervised Fine-Tuning (SFT) as a foundational step. The underlying framework versions include Transformers 4.46.3 and Pytorch 2.5.1+cu118.
Intended Use Cases
- Developing chatbots requiring safe and instruction-tuned responses.
- Applications where a balance of general knowledge and safety is crucial.
- Prototyping conversational agents with a focus on controlled output generation.