yan1008611/Selene-1-Mini-Llama-3.1-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 25, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Selene-1-Mini-Llama-3.1-8B by Atla is an 8 billion parameter small language model-as-a-judge (SLMJ), post-trained from Llama-3.1-8B. It specializes in evaluation tasks, outperforming larger models like GPT-4o on RewardBench, EvalBiasBench, and AutoJ. This model excels at absolute scoring, classification, and pairwise preference tasks, making it suitable for general-purpose evaluation with a 128K context length.

Loading preview...

Model Overview

Atla's Selene-1-Mini-Llama-3.1-8B is an 8 billion parameter model, post-trained from Llama-3.1-8B, specifically designed as a small language model-as-a-judge (SLMJ). It demonstrates strong performance in evaluation tasks, achieving results comparable to models ten times its size.

Key Capabilities & Performance

  • Evaluation Expertise: Excels across 11 benchmarks covering absolute scoring (e.g., harmlessness 1-5), classification (e.g., addressing user query Yes/No), and pairwise preference tasks.
  • Benchmark Leader: Outperforms GPT-4o on RewardBench, EvalBiasBench, and AutoJ. It is also the #1 8B generative model on RewardBench.
  • Multilingual Support: Primarily English, but also supports German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
  • Context Length: Features a substantial 128K context length.

Use Cases

This model is ideal for general-purpose evaluation, capable of handling diverse inputs and scoring scales. It generates structured evaluation outputs and provides qualitative critiques with reasoning. Specific applications include:

  • Absolute Scoring: Evaluating responses on a defined scale.
  • RAG Hallucination Detection: Identifying instances of hallucination in Retrieval-Augmented Generation (RAG) systems.

Users are encouraged to utilize the provided prompt templates for optimal results and to apply the Llama 3 conversation template.