thu-ml/STAIR-Llama-3.1-8B-SFT

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jan 17, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

thu-ml/STAIR-Llama-3.1-8B-SFT is an 8 billion parameter instruction-tuned causal language model developed by thu-ml, fine-tuned from meta-llama/Llama-3.1-8B-Instruct. It is specifically trained on the STAIR-SFT dataset, which consists of 20k prompts from UltraFeedback and PKU-SafeRLHF, to align reasoning formats and facilitate self-improvement. This model excels at generating step-level Chain-of-Thought (CoT) responses, making it suitable for tasks requiring structured reasoning and ethical response generation.

Loading preview...

Model Overview

thu-ml/STAIR-Llama-3.1-8B-SFT is an 8 billion parameter instruction-tuned model, building upon the meta-llama/Llama-3.1-8B-Instruct architecture. Developed by thu-ml, this model is a core component of the STAIR framework, designed for enhanced reasoning and self-improvement capabilities.

Key Capabilities

  • Step-level Chain-of-Thought (CoT) Reasoning: The model is fine-tuned on the STAIR-SFT dataset, which comprises 20,000 prompts from UltraFeedback and PKU-SafeRLHF, all formatted with step-level CoT answers. This training enables the model to produce detailed, step-by-step reasoning processes.
  • Ethical and Safety Alignment: As demonstrated by its handling of sensitive queries, the model is designed to provide safe and ethical responses, refusing to engage in harmful or illegal requests while offering appropriate guidance.
  • Structured Output: Responses are structured with <|Reasoning_step|> and <|Output|> tags, allowing for easy extraction of final answers and analysis of the reasoning process.

Use Cases

  • Complex Problem Solving: Ideal for applications requiring transparent, step-by-step reasoning to arrive at a solution.
  • Content Moderation and Safety: Can be employed in scenarios where ethical considerations and refusal to generate harmful content are paramount.
  • Educational Tools: Useful for generating explanations and thought processes behind answers, aiding in learning and understanding.

More details on the framework and usage can be found on the STAIR GitHub Repository.