dhmeltzer/llama-7b-SFT_eli5_wiki65k_1024_r_64_alpha_16_merged

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Aug 25, 2023Architecture:Transformer Cold

The dhmeltzer/llama-7b-SFT_eli5_wiki65k_1024_r_64_alpha_16_merged model is a 7 billion parameter language model, likely based on the Llama architecture, fine-tuned for specific tasks. It demonstrates an average performance of 43.96 on the Open LLM Leaderboard benchmarks, with notable scores in HellaSwag and Winogrande. This model is suitable for applications requiring general language understanding and generation, particularly where its fine-tuning on 'eli5' and 'wiki65k' datasets might offer specialized knowledge or response styles.

Loading preview...

Model Overview

The dhmeltzer/llama-7b-SFT_eli5_wiki65k_1024_r_64_alpha_16_merged is a 7 billion parameter language model. While its base architecture is not explicitly detailed, the naming convention suggests a foundation in the Llama family, further fine-tuned through Supervised Fine-Tuning (SFT).

Performance Benchmarks

This model has been evaluated on the Open LLM Leaderboard, achieving an overall average score of 43.96. Key performance metrics include:

  • ARC (25-shot): 53.75
  • HellaSwag (10-shot): 78.76
  • MMLU (5-shot): 46.02
  • TruthfulQA (0-shot): 43.31
  • Winogrande (5-shot): 73.48
  • GSM8K (5-shot): 4.7
  • DROP (3-shot): 7.72

These scores indicate a moderate capability across various reasoning, common sense, and language understanding tasks, with stronger performance in HellaSwag and Winogrande.

Potential Use Cases

Given its fine-tuning and benchmark results, this model could be considered for:

  • General text generation and comprehension tasks.
  • Applications requiring common sense reasoning.
  • Scenarios where a 7B parameter model offers a balance between performance and computational efficiency.