NhatHoang2002/llama3.1-8b-instruct-step-dpo
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 14, 2025License:llama3.1Architecture:Transformer Cold

NhatHoang2002/llama3.1-8b-instruct-step-dpo is an 8 billion parameter instruction-tuned language model, fine-tuned from Meta's Llama-3.1-8B-Instruct. This model specializes in mathematical reasoning, having been optimized using the xinlai/Math-Step-DPO-10K dataset. It features a 32768 token context length, making it suitable for tasks requiring detailed step-by-step problem-solving.

Loading preview...