cjiao/goldengoose-corr-v2-0.80-100
cjiao/goldengoose-corr-v2-0.80-100 is a 1.5 billion parameter instruction-tuned causal language model, fine-tuned by cjiao from Qwen/Qwen2.5-1.5B-Instruct. It utilizes the GRPO training method, as introduced in the DeepSeekMath paper, to enhance its capabilities. With a context length of 32768 tokens, this model is primarily optimized for tasks requiring improved reasoning and response coherence, particularly in conversational or question-answering scenarios.
Loading preview...
Overview
cjiao/goldengoose-corr-v2-0.80-100 is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. This model was developed by cjiao using the TRL framework and incorporates the GRPO (Gradient-based Reward Policy Optimization) training method. GRPO is a technique highlighted in the "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" paper, suggesting an emphasis on improving reasoning capabilities.
Key Capabilities
- Enhanced Reasoning: Leverages the GRPO training method, which is associated with advancements in mathematical reasoning in larger models, to potentially improve logical coherence in responses.
- Instruction Following: Built upon an instruction-tuned base model, it is designed to follow user prompts effectively.
- Conversational AI: Suitable for generating coherent and contextually relevant text in interactive scenarios, such as question answering or dialogue generation.
Good for
- General Text Generation: Creating diverse text outputs based on given prompts.
- Question Answering Systems: Providing detailed and reasoned answers to complex questions.
- Exploratory AI Development: Researchers and developers looking to experiment with models fine-tuned using advanced reinforcement learning techniques like GRPO, especially in the 1.5B parameter class.