yapeichang/Llama-3.1-8B
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 27, 2025License:llama3.1Architecture:Transformer Cold

The yapeichang/Llama-3.1-8B model is an 8 billion parameter language model, initialized from Llama-3.1-8B, developed by Yapei Chang and collaborators. This model is specifically fine-tuned using BLEU as a direct reward in GRPO training, a novel approach called BLEUBERI. It excels in general instruction following tasks, demonstrating performance comparable to reward model-guided systems while producing more factually grounded outputs.

Loading preview...