grimjim/cuckoo-starling-32k-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:May 16, 2024License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Cold

The grimjim/cuckoo-starling-32k-7B is a 7 billion parameter merged language model, created by grimjim using the SLERP method, combining Mistral-Starling-merge-trial1-7B and kukulemon-7B. It features an adjusted RoPE theta for improved narrative coherence and supports a 32K token context window. This model is optimized for general language understanding and generation, demonstrating strong performance across various reasoning and common sense benchmarks.

Loading preview...

grimjim/cuckoo-starling-32k-7B Overview

This 7 billion parameter model, developed by grimjim, is a merged language model created using the SLERP method. It combines two base models: grimjim/Mistral-Starling-merge-trial1-7B and grimjim/kukulemon-7B. A key feature is its manually adjusted RoPE theta (down to 100K), which aims to balance performance for long context queries with narrative coherence, supporting a 32K token context window.

Key Capabilities & Performance

The model has been lightly tested with ChatML and natively supports Alpaca prompts. It demonstrates solid performance across standard benchmarks, as evaluated on the Open LLM Leaderboard:

  • Average Score: 69.93
  • AI2 Reasoning Challenge (25-Shot): 66.81
  • HellaSwag (10-Shot): 85.97
  • MMLU (5-Shot): 64.88
  • TruthfulQA (0-shot): 59.03
  • Winogrande (5-shot): 80.11
  • GSM8k (5-shot): 62.77

When to Use This Model

This model is suitable for applications requiring:

  • General-purpose text generation and understanding with a focus on maintaining narrative coherence over extended contexts.
  • Tasks benefiting from a 32K token context window, such as summarizing long documents or engaging in extended conversations.
  • Exploration of merged model capabilities, particularly those derived from Mistral-based architectures.
  • Use cases compatible with ChatML or Alpaca prompting formats.