berkecr/tr-dare-merge-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The berkecr/tr-dare-merge-7B is a 7 billion parameter language model created by berkecr, built upon the Mistral-7B-Instruct-v0.2 architecture. This model is a DARE merge of Mistral-7B-Instruct-v0.2 with TURKCELL/Turkcell-LLM-7b-v1 and Trendyol/Trendyol-LLM-7b-chat-dpo-v1.0, utilizing a context length of 4096 tokens. It is specifically designed to combine the strengths of these base models, likely enhancing performance in areas relevant to their original training domains. The merge method employed is 'dare_ties', with specific density and weight parameters applied to the merged models.

Loading preview...

Model Overview

The berkecr/tr-dare-merge-7B is a 7 billion parameter language model developed by berkecr. It is constructed using the DARE merge method, specifically dare_ties, combining three distinct base models: mistralai/Mistral-7B-Instruct-v0.2, TURKCELL/Turkcell-LLM-7b-v1, and Trendyol/Trendyol-LLM-7b-chat-dpo-v1.0. The base model for this merge is Mistral-7B-Instruct-v0.2.

Key Characteristics

  • Architecture: Based on the Mistral-7B-Instruct-v0.2 framework.
  • Parameter Count: 7 billion parameters, making it a moderately sized model suitable for various applications.
  • Context Length: Supports a context window of 4096 tokens.
  • Merge Method: Utilizes the dare_ties merging technique, which involves specific density and weight parameters (0.7 density, 0.4 weight) for the Turkcell and Trendyol models during the merge process.
  • Configuration: The merge process includes an int8_mask parameter set to true and uses bfloat16 for its data type.

Potential Use Cases

Given its merged nature, this model is likely to exhibit enhanced capabilities derived from its constituent models. Developers might find it suitable for tasks requiring a blend of general instruction following (from Mistral) and potentially domain-specific knowledge or language nuances from the Turkcell and Trendyol models. Its 7B size offers a balance between performance and computational efficiency.