yan1008611/Selene-1-Mini-Llama-3.1-8B
Selene-1-Mini-Llama-3.1-8B by Atla is an 8 billion parameter small language model-as-a-judge (SLMJ), post-trained from Llama-3.1-8B. It specializes in evaluation tasks, outperforming larger models like GPT-4o on RewardBench, EvalBiasBench, and AutoJ. This model excels at absolute scoring, classification, and pairwise preference tasks, making it suitable for general-purpose evaluation with a 128K context length.
Loading preview...
Model Overview
Atla's Selene-1-Mini-Llama-3.1-8B is an 8 billion parameter model, post-trained from Llama-3.1-8B, specifically designed as a small language model-as-a-judge (SLMJ). It demonstrates strong performance in evaluation tasks, achieving results comparable to models ten times its size.
Key Capabilities & Performance
- Evaluation Expertise: Excels across 11 benchmarks covering absolute scoring (e.g., harmlessness 1-5), classification (e.g., addressing user query Yes/No), and pairwise preference tasks.
- Benchmark Leader: Outperforms GPT-4o on RewardBench, EvalBiasBench, and AutoJ. It is also the #1 8B generative model on RewardBench.
- Multilingual Support: Primarily English, but also supports German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
- Context Length: Features a substantial 128K context length.
Use Cases
This model is ideal for general-purpose evaluation, capable of handling diverse inputs and scoring scales. It generates structured evaluation outputs and provides qualitative critiques with reasoning. Specific applications include:
- Absolute Scoring: Evaluating responses on a defined scale.
- RAG Hallucination Detection: Identifying instances of hallucination in Retrieval-Augmented Generation (RAG) systems.
Users are encouraged to utilize the provided prompt templates for optimal results and to apply the Llama 3 conversation template.