dhmeltzer/llama-7b-SFT_ds_wiki65k_1024_r_64_alpha_16_merged
The dhmeltzer/llama-7b-SFT_ds_wiki65k_1024_r_64_alpha_16_merged model is a 7 billion parameter Llama-based language model developed by dhmeltzer. It is fine-tuned for general language understanding and generation, demonstrating capabilities across various benchmarks including ARC, HellaSwag, and MMLU. With a context length of 4096 tokens, this model is suitable for tasks requiring broad knowledge recall and coherent text completion.
Loading preview...
Model Overview
The dhmeltzer/llama-7b-SFT_ds_wiki65k_1024_r_64_alpha_16_merged is a 7 billion parameter language model built upon the Llama architecture. It has been fine-tuned to enhance its general language understanding and generation capabilities, as reflected in its performance across a suite of benchmarks.
Key Capabilities
This model demonstrates proficiency in several areas, with notable scores on the Open LLM Leaderboard evaluations:
- Reasoning: Achieves 54.35 on ARC (25-shot).
- Common Sense: Scores 78.06 on HellaSwag (10-shot) and 73.4 on Winogrande (5-shot).
- General Knowledge: Attains 45.35 on MMLU (5-shot).
- Factuality: Records 37.11 on TruthfulQA (0-shot).
Good For
This model is well-suited for applications requiring a balanced performance across various language tasks, including:
- General text generation and completion.
- Question answering and information retrieval where broad knowledge is beneficial.
- Tasks benefiting from common sense reasoning.
Its 4096-token context window supports processing moderately long inputs for these applications.