Name: yapeichang/Llama-3.1-8B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: yapeichang

Model Overview

The yapeichang/Llama-3.1-8B is an 8 billion parameter language model, derived from the Llama-3.1-8B base, developed by Yapei Chang and a team of researchers. This model introduces BLEUBERI, a novel training methodology that utilizes BLEU (a simple n-gram matching metric) directly as a reward in GRPO (Generalized Reinforcement Learning from Human Feedback) training. This approach extends beyond easily verifiable domains like math and code to more open-ended general instruction following.

Key Capabilities

General Instruction Following: Demonstrates strong performance across various instruction-following benchmarks.
Factual Grounding: Produces outputs that are noted for being more factually grounded compared to some reward model-trained systems.
Efficient Reward Mechanism: Leverages BLEU with high-quality references from strong LLMs, achieving human agreement comparable to larger 8B and 27B reward models on Chatbot Arena outputs.

Good For

Applications requiring robust general instruction following.
Scenarios where factual accuracy in generated text is critical.
Research and development in reinforcement learning from human feedback, particularly exploring alternative reward mechanisms.