AtlaAI/Selene-1-Mini-Llama-3.1-8B

Warm
Public
8B
FP8
32768
License: apache-2.0
Hugging Face
Overview

Model Overview

Atla Selene Mini is an 8 billion parameter small language model-as-a-judge (SLMJ) developed by Atla. It is post-trained from Llama-3.1-8B and designed for robust evaluation tasks. The model demonstrates performance comparable to models 10x its size, notably outperforming GPT-4o on RewardBench, EvalBiasBench, and AutoJ.

Key Capabilities

  • Advanced Evaluation: Excels across 11 benchmarks covering absolute scoring (e.g., harmlessness on a 1-5 scale), classification (e.g., addressing user queries with Yes/No), and pairwise preference (e.g., logical consistency comparison).
  • Top Performance: Ranks as the #1 8B generative model on RewardBench.
  • Multilingual Support: Primarily English, but also supports German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
  • Structured Outputs: Generates structured evaluation outputs and provides qualitative critiques with reasoning.
  • Extended Context: Features a 128K context length, enabling comprehensive analysis of longer inputs.

Good For

  • General-purpose evaluation: Ideal for assessing model responses, agent performance, and content quality.
  • Absolute scoring: Evaluating responses based on specific criteria and scales.
  • Classification tasks: Determining if responses meet predefined conditions.
  • Pairwise preference: Comparing and ranking responses based on desired attributes.
  • RAG hallucination detection: Cookbooks are provided for specific use cases like RAG hallucination evaluation.

Users should apply the Llama 3 conversation template for optimal performance, with prompt templates used during training available in the cookbooks.