miulab/llama2-7b-alpaca-sft-10k

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kLicense:apache-2.0Architecture:Transformer Open Weights Cold

The miulab/llama2-7b-alpaca-sft-10k is a 7 billion parameter Llama 2-based language model developed by miulab, fine-tuned using Supervised Fine-Tuning (SFT) on 10,000 Alpaca-style instructions. This model serves as the backbone SFT model for the DogeRM research, focusing on equipping reward models with domain knowledge through model merging. It is designed for research into reward model development and domain-specific knowledge integration.

Loading preview...

Model Overview

The miulab/llama2-7b-alpaca-sft-10k is a 7 billion parameter language model built upon the Llama 2 architecture. It has been fine-tuned using Supervised Fine-Tuning (SFT) on a dataset of 10,000 Alpaca-style instructions, providing it with general instruction-following capabilities.

Key Characteristics

Intended Use Cases

This model is primarily intended for:

  • Research in Reward Modeling: Serving as a base for further development and experimentation with reward models, particularly in the context of integrating domain-specific knowledge.
  • Understanding SFT Impact: Investigating the effects of SFT on Llama 2 models for instruction following.
  • Academic Exploration: Supporting studies related to model merging techniques and their application in enhancing LLM capabilities.

Detailed training and evaluation information can be found via the provided Weights & Biases link in the original paper's resources.