grimjim/cuckoo-starling-32k-7B
The grimjim/cuckoo-starling-32k-7B is a 7 billion parameter merged language model, created by grimjim using the SLERP method, combining Mistral-Starling-merge-trial1-7B and kukulemon-7B. It features an adjusted RoPE theta for improved narrative coherence and supports a 32K token context window. This model is optimized for general language understanding and generation, demonstrating strong performance across various reasoning and common sense benchmarks.
Loading preview...
grimjim/cuckoo-starling-32k-7B Overview
This 7 billion parameter model, developed by grimjim, is a merged language model created using the SLERP method. It combines two base models: grimjim/Mistral-Starling-merge-trial1-7B and grimjim/kukulemon-7B. A key feature is its manually adjusted RoPE theta (down to 100K), which aims to balance performance for long context queries with narrative coherence, supporting a 32K token context window.
Key Capabilities & Performance
The model has been lightly tested with ChatML and natively supports Alpaca prompts. It demonstrates solid performance across standard benchmarks, as evaluated on the Open LLM Leaderboard:
- Average Score: 69.93
- AI2 Reasoning Challenge (25-Shot): 66.81
- HellaSwag (10-Shot): 85.97
- MMLU (5-Shot): 64.88
- TruthfulQA (0-shot): 59.03
- Winogrande (5-shot): 80.11
- GSM8k (5-shot): 62.77
When to Use This Model
This model is suitable for applications requiring:
- General-purpose text generation and understanding with a focus on maintaining narrative coherence over extended contexts.
- Tasks benefiting from a 32K token context window, such as summarizing long documents or engaging in extended conversations.
- Exploration of merged model capabilities, particularly those derived from Mistral-based architectures.
- Use cases compatible with ChatML or Alpaca prompting formats.