Name: NhatHoang2002/llama3.1-8b-instruct-step-dpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: NhatHoang2002

Model Overview

This model, llama3.1-8b-instruct-step-dpo, is an 8 billion parameter instruction-tuned language model. It is a fine-tuned version of the robust meta-llama/Llama-3.1-8B-Instruct base model, developed by NhatHoang2002.

Key Capabilities

Mathematical Reasoning: The model has been specifically fine-tuned using the xinlai/Math-Step-DPO-10K dataset, indicating an optimization for tasks that require step-by-step mathematical problem-solving and logical deduction.
Instruction Following: As an instruction-tuned model, it is designed to follow user prompts and instructions effectively, leveraging its Llama 3.1 base.
Extended Context: With a context length of 32768 tokens, it can process and generate longer sequences of text, beneficial for complex problems or multi-turn conversations.

Training Details

The model was trained with a learning rate of 5e-07, using a total batch size of 64 across 4 GPUs. The training involved 4 epochs with an Adam optimizer and a cosine learning rate scheduler. This DPO (Direct Preference Optimization) fine-tuning approach on a specialized mathematical dataset aims to enhance its performance in structured reasoning tasks.

Good For

Applications requiring detailed mathematical problem-solving.
Educational tools for explaining mathematical concepts step-by-step.
Tasks where logical reasoning and instruction adherence are critical.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)