sleeepeer/llama3-warm_up-dolly_new_1200_0113-42-202601130042

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jan 13, 2026Architecture:Transformer Cold

The sleeepeer/llama3-warm_up-dolly_new_1200_0113-42-202601130042 model is an 8 billion parameter language model, fine-tuned from a Llama 3.1 base using the GRPO method. This model is specifically optimized for mathematical reasoning tasks, leveraging a technique introduced in the DeepSeekMath paper. It is designed to enhance the mathematical capabilities of large language models, making it suitable for applications requiring robust numerical and logical problem-solving.

Loading preview...

Model Overview

This model, sleeepeer/llama3-warm_up-dolly_new_1200_0113-42-202601130042, is an 8 billion parameter language model derived from sleeepeer/meta-llama-Llama-3.1-8B-Instruct-sanitization-clean-OPI_SEP-42-202601102333. It has been fine-tuned using the TRL (Transformer Reinforcement Learning) framework.

Key Capabilities

  • Mathematical Reasoning: The model's primary differentiator is its training with GRPO (Guided Reasoning Policy Optimization), a method detailed in the DeepSeekMath paper. This technique is designed to significantly enhance mathematical reasoning abilities in large language models.
  • Instruction Following: As a fine-tuned instruction model, it is capable of understanding and executing user prompts effectively.
  • Llama 3.1 Base: Built upon the Llama 3.1 architecture, it inherits the strong foundational capabilities of this family of models.

Training Details

The model was trained using the TRL library, with specific framework versions including TRL 0.26.2, Transformers 4.56.2, Pytorch 2.9.0, Datasets 4.4.2, and Tokenizers 0.22.1. The GRPO method, central to its mathematical optimization, was introduced in the 2024 DeepSeekMath research.

Recommended Use Cases

This model is particularly well-suited for applications requiring advanced mathematical problem-solving, logical deduction, and general instruction-following where numerical accuracy and reasoning are critical.