hkust-nlp/Qwen-2.5-1.5B-SimpleRL-Zoo

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 24, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Qwen-2.5-1.5B-SimpleRL-Zoo is a 1.5 billion parameter language model developed by hkust-nlp, featuring a substantial context length of 131072 tokens. This model is part of the Qwen 2.5 family and is specifically fine-tuned using SimpleRL techniques. Its design focuses on leveraging reinforcement learning for enhanced performance, making it suitable for applications requiring robust language understanding and generation within a large context window.

Loading preview...

Qwen-2.5-1.5B-SimpleRL-Zoo Overview

This model, developed by hkust-nlp, is a 1.5 billion parameter variant from the Qwen 2.5 series. It distinguishes itself through its application of SimpleRL (Reinforcement Learning) techniques during fine-tuning, aiming to optimize its performance across various language tasks. A notable technical specification is its extensive context window, supporting up to 131072 tokens, which allows for processing and generating very long sequences of text.

Key Capabilities

  • Reinforcement Learning Fine-tuning: Leverages SimpleRL for potentially improved instruction following and response quality.
  • Large Context Window: Capable of handling inputs and generating outputs up to 131072 tokens, beneficial for tasks requiring extensive contextual understanding.
  • Qwen 2.5 Architecture: Built upon the foundational Qwen 2.5 model family, inheriting its core language processing strengths.

Good For

  • Long-form Content Generation: Ideal for applications that require generating or understanding lengthy documents, articles, or conversations.
  • Context-heavy Tasks: Suitable for tasks where maintaining coherence and relevance over extended text is crucial, such as summarization of large texts or complex question answering.
  • Research in RL-tuned LLMs: Provides a base for exploring the impact and effectiveness of SimpleRL methods on large language models.