Overview
Model Overview
Atla Selene Mini is an 8 billion parameter small language model-as-a-judge (SLMJ) developed by Atla. It is post-trained from Llama-3.1-8B and designed for robust evaluation tasks. The model demonstrates performance comparable to models 10x its size, notably outperforming GPT-4o on RewardBench, EvalBiasBench, and AutoJ.
Key Capabilities
- Advanced Evaluation: Excels across 11 benchmarks covering absolute scoring (e.g., harmlessness on a 1-5 scale), classification (e.g., addressing user queries with Yes/No), and pairwise preference (e.g., logical consistency comparison).
- Top Performance: Ranks as the #1 8B generative model on RewardBench.
- Multilingual Support: Primarily English, but also supports German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
- Structured Outputs: Generates structured evaluation outputs and provides qualitative critiques with reasoning.
- Extended Context: Features a 128K context length, enabling comprehensive analysis of longer inputs.
Good For
- General-purpose evaluation: Ideal for assessing model responses, agent performance, and content quality.
- Absolute scoring: Evaluating responses based on specific criteria and scales.
- Classification tasks: Determining if responses meet predefined conditions.
- Pairwise preference: Comparing and ranking responses based on desired attributes.
- RAG hallucination detection: Cookbooks are provided for specific use cases like RAG hallucination evaluation.
Users should apply the Llama 3 conversation template for optimal performance, with prompt templates used during training available in the cookbooks.