ozone-research/Chirp-01

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Feb 23, 2025License:qwen-researchArchitecture:Transformer0.0K Warm

Chirp-01 by Ozone Research is a 3.1 billion parameter language model, fine-tuned from Qwen2.5 3B Instruct with 50 million tokens of distilled GPT-4o data. This compact model demonstrates strong performance on benchmarks like MMLU Pro and IFEval, showing significant improvement over its base model. It is designed to push the capabilities of small-scale LLMs, making it suitable for research and applications requiring efficient yet powerful language understanding.

Loading preview...

Overview

Chirp-01 is a 3.1 billion parameter language model developed by the Ozone Research team. It is fine-tuned from the Qwen2.5 3B Instruct base model, utilizing 50 million tokens of distilled data from GPT-4o. This training approach enables Chirp-01 to achieve strong performance for its size, making it a notable contribution to the field of compact, high-performing LLMs.

Key Capabilities & Performance

  • Parameter Count: 3.1 billion parameters, offering a balance of capability and efficiency.
  • Training Data: Leverages 50 million tokens distilled from GPT-4o, a method aimed at enhancing instruction following and general intelligence.
  • Benchmark Performance:
    • MMLU Pro: Achieves an overall average accuracy of 0.4320, representing a 9-point improvement over its base model.
    • IFEval: Scores 72%, demonstrating a 14% improvement compared to the base model.

Use Cases

Chirp-01 is an open-source model designed to explore the limits of small-scale LLMs. Its strong benchmark performance for a 3B model suggests its utility for:

  • Researchers and developers interested in efficient language models.
  • Applications where computational resources are constrained but robust language understanding is required.
  • Further experimentation and development in the domain of compact AI models.