sleeepeer/meta-llama-Llama-3.1-8B-Instruct-sanitization-dolly-alpaca-5k-0202-42-202602051312
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Feb 5, 2026Architecture:Transformer Cold

This is an 8 billion parameter instruction-tuned Llama 3.1 model, fine-tuned by sleeepeer using the GRPO method. It is based on Meta Llama 3.1-8B-Instruct and trained with TRL. The model is optimized for enhanced performance, particularly in areas related to mathematical reasoning, leveraging insights from the DeepSeekMath research. It offers a 32K context length, making it suitable for tasks requiring extensive contextual understanding.

Loading preview...