Model Overview
thu-ml/STAIR-Llama-3.1-8B-SFT is an 8 billion parameter instruction-tuned model, building upon the meta-llama/Llama-3.1-8B-Instruct architecture. Developed by thu-ml, this model is a core component of the STAIR framework, designed for enhanced reasoning and self-improvement capabilities.
Key Capabilities
- Step-level Chain-of-Thought (CoT) Reasoning: The model is fine-tuned on the STAIR-SFT dataset, which comprises 20,000 prompts from UltraFeedback and PKU-SafeRLHF, all formatted with step-level CoT answers. This training enables the model to produce detailed, step-by-step reasoning processes.
- Ethical and Safety Alignment: As demonstrated by its handling of sensitive queries, the model is designed to provide safe and ethical responses, refusing to engage in harmful or illegal requests while offering appropriate guidance.
- Structured Output: Responses are structured with
<|Reasoning_step|> and <|Output|> tags, allowing for easy extraction of final answers and analysis of the reasoning process.
Use Cases
- Complex Problem Solving: Ideal for applications requiring transparent, step-by-step reasoning to arrive at a solution.
- Content Moderation and Safety: Can be employed in scenarios where ethical considerations and refusal to generate harmful content are paramount.
- Educational Tools: Useful for generating explanations and thought processes behind answers, aiding in learning and understanding.
More details on the framework and usage can be found on the STAIR GitHub Repository.