athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1-plus_reddit

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jul 30, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1-plus_reddit is an 8 billion parameter Llama-3.1-Instruct model further pretrained for one epoch on a filtered dataset of Reddit dirty stories. This model aims to address the repetition and token overconfidence issues observed in base Llama-3.1 models within the 8B parameter constraint. It is specifically designed for niche use cases requiring Llama-3.1's logical capabilities while mitigating its common generative pitfalls.

Loading preview...

Model Overview

This model, athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1-plus_reddit, is an 8 billion parameter variant of the Llama-3.1-Instruct architecture. It has undergone an additional epoch of pretraining on a curated dataset of 'dirty stories' sourced from nothingiisreal/Reddit-Dirty-And-WritingPrompts, specifically filtering out entries with scores below 2.

Key Characteristics

  • Base Model: Llama-3.1-Instruct, known for its logical capabilities.
  • Parameter Count: 8 billion parameters, suitable for environments with compute constraints.
  • Context Length: Supports a context length of 32768 tokens.
  • Training Focus: The primary goal of this additional pretraining was to disrupt the inherent 'repetition/token overconfidence problem' often observed in Llama-3/3.1 models, without compromising their core functionality or logical reasoning abilities.

Performance Insights

Evaluations on the Open LLM Leaderboard show an average score of 20.74. Specific metric scores include:

  • IFEval (0-Shot): 45.21
  • BBH (3-Shot): 28.02
  • MMLU-PRO (5-shot): 28.50

Intended Use Case

This model is developed for users who require the logical prowess of Llama-3.1 within an 8B parameter budget but seek to mitigate its common generative issues like repetition. It is particularly suited for niche applications where a balance between logical coherence and reduced generative certainty is desired.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p