Model Overview

Atla Selene Mini is an 8 billion parameter small language model-as-a-judge (SLMJ) developed by Atla. It is post-trained from Llama-3.1-8B and designed for robust evaluation tasks. The model demonstrates performance comparable to models 10x its size, notably outperforming GPT-4o on RewardBench, EvalBiasBench, and AutoJ.

Key Capabilities

Advanced Evaluation: Excels across 11 benchmarks covering absolute scoring (e.g., harmlessness on a 1-5 scale), classification (e.g., addressing user queries with Yes/No), and pairwise preference (e.g., logical consistency comparison).
Top Performance: Ranks as the #1 8B generative model on RewardBench.
Multilingual Support: Primarily English, but also supports German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
Structured Outputs: Generates structured evaluation outputs and provides qualitative critiques with reasoning.
Extended Context: Features a 128K context length, enabling comprehensive analysis of longer inputs.

Good For

General-purpose evaluation: Ideal for assessing model responses, agent performance, and content quality.
Absolute scoring: Evaluating responses based on specific criteria and scales.
Classification tasks: Determining if responses meet predefined conditions.
Pairwise preference: Comparing and ranking responses based on desired attributes.
RAG hallucination detection: Cookbooks are provided for specific use cases like RAG hallucination evaluation.

Users should apply the Llama 3 conversation template for optimal performance, with prompt templates used during training available in the cookbooks.