sparkle-reasoning/SparkleRL-7B-Stage1
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kLicense:mitArchitecture:Transformer0.0K Warm

SparkleRL-7B-Stage1 is a 7.6 billion parameter causal language model developed by sparkle-reasoning, representing the Stage 1 RL-tuned model from the research detailed in "Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning." This model is specifically optimized for mathematical reasoning tasks, leveraging reinforcement learning to enhance its problem-solving capabilities. With a context length of 131072 tokens, it is designed for complex analytical challenges.

Loading preview...