adamo1139/Yi-34B-200K-AEZAKMI-RAW-1701

TEXT GENERATIONConcurrency Cost:2Model Size:34BQuant:FP8Ctx Length:32kPublished:Jan 18, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

adamo1139/Yi-34B-200K-AEZAKMI-RAW-1701 is an experimental 34 billion parameter language model, based on the Yi-34B-200K architecture, fine-tuned using DPO on RAWrr_v1 and SFT on AEZAKMI_v2. This model features a 32768 token context length and aims to reduce refusal rates compared to previous iterations. It demonstrates an average performance of 71.04 on the Open LLM Leaderboard, with notable scores in HellaSwag (85.79) and MMLU (75.44).

Loading preview...

Model Overview

adamo1139/Yi-34B-200K-AEZAKMI-RAW-1701 is an experimental 34 billion parameter language model built upon the Yi-34B-200K base. It has undergone a multi-stage fine-tuning process, initially with DPO (Direct Preference Optimization) on the RAWrr_v1 dataset at a context length of 200, followed by SFT (Supervised Fine-Tuning) on the AEZAKMI_v2 dataset at a context length of 1400. The model supports a substantial 32768 token context window.

Key Characteristics

  • Reduced Refusals: This iteration is designed to be less prone to refusals compared to its predecessor, Yi-34B-200K-AEZAKMI-v2, though this remains an ongoing development focus.
  • Training Methodology: Utilizes a combination of DPO and SFT, with specific LoRA ranks (r=4, alpha=8 for DPO; r=16, alpha=32 for SFT) applied during training.
  • Open LLM Leaderboard Performance: Achieves an average score of 71.04 on the Open LLM Leaderboard. Specific benchmark results include:
    • HellaSwag (10-Shot): 85.79
    • MMLU (5-Shot): 75.44
    • AI2 Reasoning Challenge (25-Shot): 66.81
    • Winogrande (5-Shot): 80.35
    • GSM8k (5-Shot): 59.97
    • TruthfulQA (0-Shot): 57.91

Intended Use Cases

This model is suitable for developers and researchers interested in exploring experimental large language models with a focus on reduced refusal rates and long context capabilities. Its performance across various benchmarks suggests potential for general language understanding and reasoning tasks, particularly where a balance between performance and refusal behavior is desired. Users should note its experimental status and potential for further refinement.