dhmeltzer/llama-7b-SFT_ds_wiki65k_1024_r_64_alpha_16_merged

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Aug 25, 2023Architecture:Transformer Cold

The dhmeltzer/llama-7b-SFT_ds_wiki65k_1024_r_64_alpha_16_merged model is a 7 billion parameter Llama-based language model developed by dhmeltzer. It is fine-tuned for general language understanding and generation, demonstrating capabilities across various benchmarks including ARC, HellaSwag, and MMLU. With a context length of 4096 tokens, this model is suitable for tasks requiring broad knowledge recall and coherent text completion.

Loading preview...

Model Overview

The dhmeltzer/llama-7b-SFT_ds_wiki65k_1024_r_64_alpha_16_merged is a 7 billion parameter language model built upon the Llama architecture. It has been fine-tuned to enhance its general language understanding and generation capabilities, as reflected in its performance across a suite of benchmarks.

Key Capabilities

This model demonstrates proficiency in several areas, with notable scores on the Open LLM Leaderboard evaluations:

  • Reasoning: Achieves 54.35 on ARC (25-shot).
  • Common Sense: Scores 78.06 on HellaSwag (10-shot) and 73.4 on Winogrande (5-shot).
  • General Knowledge: Attains 45.35 on MMLU (5-shot).
  • Factuality: Records 37.11 on TruthfulQA (0-shot).

Good For

This model is well-suited for applications requiring a balanced performance across various language tasks, including:

  • General text generation and completion.
  • Question answering and information retrieval where broad knowledge is beneficial.
  • Tasks benefiting from common sense reasoning.

Its 4096-token context window supports processing moderately long inputs for these applications.