Name: sleeepeer/llama3-warm_up-dolly_new_1200_0113-42-202601130042 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: sleeepeer

Model Overview

This model, sleeepeer/llama3-warm_up-dolly_new_1200_0113-42-202601130042, is an 8 billion parameter language model derived from sleeepeer/meta-llama-Llama-3.1-8B-Instruct-sanitization-clean-OPI_SEP-42-202601102333. It has been fine-tuned using the TRL (Transformer Reinforcement Learning) framework.

Key Capabilities

Mathematical Reasoning: The model's primary differentiator is its training with GRPO (Guided Reasoning Policy Optimization), a method detailed in the DeepSeekMath paper. This technique is designed to significantly enhance mathematical reasoning abilities in large language models.
Instruction Following: As a fine-tuned instruction model, it is capable of understanding and executing user prompts effectively.
Llama 3.1 Base: Built upon the Llama 3.1 architecture, it inherits the strong foundational capabilities of this family of models.

Training Details

The model was trained using the TRL library, with specific framework versions including TRL 0.26.2, Transformers 4.56.2, Pytorch 2.9.0, Datasets 4.4.2, and Tokenizers 0.22.1. The GRPO method, central to its mathematical optimization, was introduced in the 2024 DeepSeekMath research.

Recommended Use Cases

This model is particularly well-suited for applications requiring advanced mathematical problem-solving, logical deduction, and general instruction-following where numerical accuracy and reasoning are critical.

Overview

Model Overview

Key Capabilities

Training Details

Recommended Use Cases

Full Model Card (README)