agentlans/Llama3.1-8B-drill

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 27, 2024Architecture:Transformer0.0K Cold

agentlans/Llama3.1-8B-drill is an 8 billion parameter language model merged from top-performing Llama 3.1 8B models on the IFEval task, designed to enhance instruction following capabilities. Utilizing the Model Stock merge method with Meta-Llama-3.1-8B-Instruct as its base, this model aims for improved adherence to instructions. While its primary purpose was strong instruction following, the developer notes it performs moderately compared to its parent models. It has a context length of 32768 tokens.

Loading preview...

Llama3.1-8B-drill Overview

agentlans/Llama3.1-8B-drill is an 8 billion parameter model created by merging several high-scoring Llama 3.1 8B models from the Open LLM Leaderboard's IFEval task. The primary goal of this merge was to produce a model with enhanced instruction-following capabilities. It was developed using the Model Stock merge method, with Meta-Llama-3.1-8B-Instruct serving as the base model.

Key Characteristics

  • Merge of Top Performers: Combines models like Dampfinchen/Llama-3.1-8B-Ultra-Instruct, vicgalle/Configurable-Llama-3.1-8B-Instruct, allenai/Llama-3.1-Tulu-3-8B, and akjindal53244/Llama-3.1-Storm-8B.
  • Instruction Following Focus: Specifically designed to improve adherence to user instructions.
  • Context Length: Supports a context length of 32768 tokens.

Performance Notes

While intended for strong instruction following, the developer notes that this merged model's performance, even on the IFEval task, is considered mediocre compared to its constituent parent models. Users seeking optimal instruction following might consider using the individual parent models directly.

Good for

  • Experimenting with model merging techniques, specifically the Model Stock method.
  • Use cases where a Llama 3.1 8B base is desired, and instruction following is a key, though not necessarily top-tier, requirement.
  • Research into how model merges impact specific task performance, particularly instruction following.