Jackrong/Llama-3.1-8B-Think-Zero-GRPO
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kLicense:apache-2.0Architecture:Transformer Open Weights Cold

Jackrong/Llama-3.1-8B-Think-Zero-GRPO is an 8 billion parameter language model developed by Jackrong, fine-tuned from unsloth/Llama-3.1-8B-Instruct with a 32768 token context length. This variant was trained exclusively using Group Relative Policy Optimization (GRPO) with a focus on mathematical principles and minimal cold-start data. It serves as an intermediate version of Llama3.1-8B-Thinking-R1, showcasing a unique training methodology.

Loading preview...