RESMPDEV/Qwen1.5-Wukong-0.5B

Warm
Public
0.6B
BF16
32768
Feb 19, 2024
License: tongyi-qianwen-research
Hugging Face
Overview

Overview

RESMPDEV/Qwen1.5-Wukong-0.5B is a 0.6 billion parameter, decoder-only language model, finetuned from the Qwen1.5-0.5B base model. It features a 32K token context length and has been specifically dealigned for chat applications. The model was trained for 3 epochs using the Teknium OpenHermes-2.5 dataset, supplemented by additional datasets from Cognitive Computations.

Key Characteristics

  • Base Model: Built upon the Qwen1.5-0.5B architecture, which is a transformer-based decoder-only model.
  • Training Data: Utilizes the Teknium OpenHermes-2.5 dataset and proprietary Cognitive Computations datasets.
  • Dealigned Finetune: Optimized for chat interactions with a focus on a 'dealigned' approach, suggesting a departure from strict alignment for certain conversational styles.
  • Context Length: Supports a stable context length of 32,768 tokens.

Performance

Evaluations on the Open LLM Leaderboard show an average score of 38.15. Specific metric scores include:

  • AI2 Reasoning Challenge (25-Shot): 31.74
  • HellaSwag (10-Shot): 47.78
  • MMLU (5-Shot): 38.44
  • TruthfulQA (0-shot): 38.92
  • Winogrande (5-shot): 56.51
  • GSM8k (5-shot): 15.54

Use Cases

This model is suitable for developers looking for a compact, specialized chat model derived from the Qwen1.5 series, particularly for applications where a 'dealigned' conversational style is desired.