Lambent/qwen2.5-reinstruct-alternate-lumen-14B

TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Sep 23, 2024Architecture:Transformer0.0K Cold

Lambent/qwen2.5-reinstruct-alternate-lumen-14B is a 14.8 billion parameter language model created by Lambent, based on the Qwen2.5-14B-Instruct architecture. This model is a merge of Qwen/Qwen2.5-14B-Instruct and Lambent/qwen2.5-lumen-rebased-14B, utilizing the della merge method to enhance instruction following and general intelligence. It supports a 32768 token context length and is designed for text generation across multiple languages, showing improved performance on instruction-following benchmarks like IFEval.

Loading preview...

Overview

Lambent/qwen2.5-reinstruct-alternate-lumen-14B is a 14.8 billion parameter instruction-tuned language model developed by Lambent. It is a merged model, combining the strengths of Qwen/Qwen2.5-14B-Instruct and Lambent/qwen2.5-lumen-rebased-14B using the della merge method. The primary goal of this merge was to improve instruction following capabilities and general intelligence, specifically addressing issues observed in previous merges and aiming to heal aspects of the original Instruct model.

Key Capabilities & Performance

  • Enhanced Instruction Following: The model was specifically engineered to improve instruction adherence, achieving an IFEval (0-Shot) strict accuracy of 47.94. This metric indicates its proficiency in understanding and executing given instructions.
  • Multilingual Support: It supports a wide array of languages including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic.
  • General Reasoning: Performance on benchmarks like BBH (3-Shot) is 48.99 normalized accuracy, and MMLU-PRO (5-shot) is 48.76 accuracy, suggesting solid general reasoning capabilities for its size.
  • Context Length: The model supports a substantial context window of 32768 tokens, allowing for processing longer inputs and generating more coherent, extended responses.

When to Use This Model

This model is particularly well-suited for applications requiring robust instruction following and text generation in a multilingual context. Its improved performance on instruction-based tasks makes it a strong candidate for chatbots, content generation, and complex query answering where precise adherence to prompts is crucial. Developers looking for a 14B parameter model with a large context window and enhanced instruction capabilities, especially those familiar with the Qwen2.5 family, should consider this model.