Artples/L-MChat-Small

Warm
Public
3B
BF16
2048
License: mit
Hugging Face
Overview

L-MChat-Small: A Compact Merged Language Model

L-MChat-Small is a 3 billion parameter language model developed by Artples, created to investigate the performance potential of smaller, merged architectures. Unlike many larger models, this model focuses on efficiency while maintaining utility for conversational tasks.

Key Capabilities & Features

  • Architecture: A merged model utilizing the SLERP method, combining rhysjones/phi-2-orange-v2 and Weyaxi/Einstein-v4-phi2.
  • Parameter Count: 3 billion parameters, offering a more compact footprint compared to larger models.
  • Context Length: Supports a 2048-token context window.
  • Performance: Achieves an average score of 63.14 on the Open LLM Leaderboard, with specific scores including 61.60 on AI2 Reasoning Challenge and 75.90 on HellaSwag.

Use Cases & Strengths

  • General Chat Applications: Optimized for conversational interactions using the ChatML format.
  • Resource-Constrained Environments: Its smaller size makes it suitable for deployment where computational resources are limited.
  • Exploration of Merge Methods: Demonstrates the effectiveness of the SLERP merge method for creating capable models from existing smaller bases.