pansophic/rocket-3B
Rocket-3B by pansophic is a 2.2 billion parameter GPT-like language model, fine-tuned using Direct Preference Optimization (DPO) primarily for English chat applications. Despite its compact size, it achieves strong performance on benchmarks like MT-Bench and AlpacaEval, often surpassing larger models. It is optimized for conversational tasks and designed for efficient deployment.
Loading preview...
Rocket-3B: A Compact Yet Powerful Chat Model
pansophic's Rocket-3B is a 2.2 billion parameter language model, fine-tuned from Stability AI's StableLM-3B-4E1T using Direct Preference Optimization (DPO). This approach, combined with a mix of publicly available datasets, has resulted in a highly effective chat model, primarily for English language tasks.
Key Capabilities & Performance
Rocket-3B stands out for its ability to deliver strong performance despite its small size. It achieves notable scores on key benchmarks:
- MT-Bench: An average score of 6.56, outperforming several larger models like Llama2-Chat-7B and Falcon-40B-Instruct.
- AlpacaEval: A win rate of nearly 80% with an average response length of 1,242 tokens, indicating detailed and effective responses.
- Open LLM Leaderboard: An average score of 55.77, with specific results like 76.69 on HellaSwag and 55.82 on TruthfulQA.
Intended Uses & Limitations
Rocket-3B is designed as an effective chat model, trained with the ChatML format. Its compact size makes it suitable for applications where computational resources are a consideration. However, it's important to note that, unlike models such as ChatGPT, Rocket-3B lacks in-the-loop filtering and safety alignment features, meaning it may generate problematic outputs if not carefully prompted. The model was trained on a filtered mixture of datasets including Falcon RefinedWeb, RedPajama-Data (without Books3), The Pile (without Books3), and StarCoder.