Thrillcrazyer/Qwen-7B_TAC_RLOO
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Jan 6, 2026Architecture:Transformer Cold

Thrillcrazyer/Qwen-7B_TAC_RLOO is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B-Instruct, specifically optimized for mathematical reasoning tasks. It leverages the DeepMath-103k dataset and was trained using the RLOO method, a REINFORCE-style optimization technique for learning from human feedback. This model is designed to enhance performance in complex mathematical problem-solving and related analytical applications, offering a substantial 131072 token context length.

Loading preview...