Model Overview

lil-c3po is a 7 billion parameter open-source large language model developed by deepnight-research. It is the result of a linear merge of two distinct internally developed and fine-tuned Mistral-7B models, c3-1 and c3-2, aiming to combine their unique strengths for enhanced performance.

Key Architectural & Training Details

Architecture: Inherits features like Grouped-Query Attention, Sliding-Window Attention, and a Byte-fallback BPE tokenizer from its Mistral-7B base models.
c3-1: A 7B parameter model fine-tuned using Direct Performance Optimization (DPO) on the Intel Gaudi 2 processor, designed for various language tasks.
c3-2: An instruct fine-tuned version of Mistral-7B, contributing to improved language understanding in instructional contexts.

Performance & Licensing

Benchmarks: Achieves an average score of 68.03 on the Open LLM Leaderboard, with notable scores including 84.45 on HellaSwag (10-Shot) and 79.16 on Winogrande (5-shot). Detailed evaluation results are available on the Open LLM Leaderboard.
License: Released under the permissive MIT license, encouraging open-source collaboration.

Intended Use Cases

lil-c3po is suitable for a wide array of general language-related tasks, leveraging the combined capabilities of its fine-tuned components. For highly specific applications, further fine-tuning may be beneficial.

Overview

Model Overview

Key Architectural & Training Details

Performance & Licensing

Intended Use Cases

Full Model Card (README)