cognAI/lil-c3po

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 15, 2023License:mitArchitecture:Transformer0.0K Open Weights Cold

cognAI/lil-c3po is a 7 billion parameter open-source large language model created by deepnight-research, formed by a linear merge of two distinct fine-tuned Mistral-7B models. It incorporates Grouped-Query Attention, Sliding-Window Attention, and a Byte-fallback BPE tokenizer. One base model was fine-tuned with DPO on Intel Gaudi 2, while the other was instruct fine-tuned, making lil-c3po suitable for a broad range of language-related tasks.

Loading preview...

Model Overview

lil-c3po is a 7 billion parameter open-source large language model developed by deepnight-research. It is the result of a linear merge of two distinct internally developed and fine-tuned Mistral-7B models, c3-1 and c3-2, aiming to combine their unique strengths for enhanced performance.

Key Architectural & Training Details

  • Architecture: Inherits features like Grouped-Query Attention, Sliding-Window Attention, and a Byte-fallback BPE tokenizer from its Mistral-7B base models.
  • c3-1: A 7B parameter model fine-tuned using Direct Performance Optimization (DPO) on the Intel Gaudi 2 processor, designed for various language tasks.
  • c3-2: An instruct fine-tuned version of Mistral-7B, contributing to improved language understanding in instructional contexts.

Performance & Licensing

  • Benchmarks: Achieves an average score of 68.03 on the Open LLM Leaderboard, with notable scores including 84.45 on HellaSwag (10-Shot) and 79.16 on Winogrande (5-shot). Detailed evaluation results are available on the Open LLM Leaderboard.
  • License: Released under the permissive MIT license, encouraging open-source collaboration.

Intended Use Cases

lil-c3po is suitable for a wide array of general language-related tasks, leveraging the combined capabilities of its fine-tuned components. For highly specific applications, further fine-tuning may be beneficial.