ehristoforu/coolqwen-3b-it

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Jan 2, 2025License:qwen-researchArchitecture:Transformer0.0K Warm

The ehristoforu/coolqwen-3b-it model is an instruction-tuned 3.09 billion parameter causal language model from the Qwen2.5 series, developed by Qwen Team. It features a 32,768 token context length and is significantly improved in coding, mathematics, and instruction following due to specialized expert models. This model excels at generating long texts, understanding structured data, and producing structured outputs like JSON, with robust multilingual support for over 29 languages.

Loading preview...

Qwen2.5-3B-Instruct Overview

This model is the instruction-tuned 3.09 billion parameter variant from the Qwen2.5 series, developed by the Qwen Team. It builds upon previous Qwen models with substantial enhancements across several key areas, making it a versatile choice for various NLP tasks.

Key Capabilities & Improvements

  • Enhanced Knowledge & Reasoning: Significantly improved capabilities in coding and mathematics, leveraging specialized expert models.
  • Superior Instruction Following: Demonstrates marked improvements in adhering to instructions and is more resilient to diverse system prompts, aiding in role-play and chatbot implementations.
  • Long Context & Generation: Supports a full context length of 32,768 tokens and can generate outputs up to 8,192 tokens, ideal for complex and extended text tasks.
  • Structured Data Handling: Excels at understanding structured data, such as tables, and generating structured outputs, particularly JSON.
  • Multilingual Support: Offers robust support for over 29 languages, including major global languages like Chinese, English, French, Spanish, German, and Japanese.

Architecture & Training

The model is a causal language model based on the transformer architecture, incorporating features like RoPE, SwiGLU, RMSNorm, and attention QKV bias. It has 36 layers and 16 attention heads (with 2 for KV in GQA configuration). The model underwent both pre-training and post-training stages to achieve its enhanced performance.