nayohan/llama3-8b-it-prometheus-ko
The nayohan/llama3-8b-it-prometheus-ko model is an 8 billion parameter Llama 3 instruction-tuned language model with an 8192 token context length. Developed by nayohan, it is specifically fine-tuned on a Korean-translated version of the Prometheus Feedback-Collection dataset. This model excels at evaluating responses based on given rubrics, providing detailed feedback and a numerical score, making it suitable for automated content assessment and quality control in Korean.
Loading preview...
Overview
This model, nayohan/llama3-8b-it-prometheus-ko, is an 8 billion parameter Llama 3 instruction-tuned language model. It was developed by nayohan by translating the prometheus-eval/Feedback-Collection dataset into Korean and then fine-tuning the base Llama 3-8B-IT model on this Korean dataset. The primary purpose of this model is to provide detailed feedback and assign a numerical score (1-5) to a given response based on a provided instruction, reference answer, and a score rubric.
Key Capabilities
- Automated Evaluation: The model can evaluate responses against specific criteria, generating both qualitative feedback and a quantitative score.
- Korean Language Support: Fine-tuned specifically for Korean, it processes and generates feedback in Korean.
- Rubric-Based Assessment: It strictly adheres to a given score rubric, ensuring evaluations are consistent and focused on predefined criteria.
- Flexible Input: Can perform evaluations with or without a reference answer, adapting to different assessment scenarios.
Use Cases
- Content Quality Assurance: Automatically assess the quality of generated text, customer service responses, or educational content in Korean.
- Feedback Generation: Provide structured and detailed feedback for various text-based tasks.
- Research and Development: Useful for researchers working on automated evaluation systems for Korean language models.
Performance Insights
Evaluation on a 200-sample Korean test set (derived from nayohan/feedback-collection-ko-chat) shows its learning capability, with a heatmap visualization indicating its performance in aligning generated scores with correct answers. The model demonstrates strong adherence to the evaluation framework it was trained on.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.