LL-Square/LLSquare-7B-Instruct
LLSquare-7B-Instruct is a 7.61 billion parameter instruction-tuned causal language model developed by LL-Square, built upon a transformer architecture with RoPE, SwiGLU, and RMSNorm. It features significantly improved capabilities in coding, mathematics, and instruction following, along with enhanced long text generation and structured data understanding. The model supports a context length of up to 131,072 tokens and is optimized for multilingual applications across over 29 languages.
Loading preview...
LLSquare-7B-Instruct Overview
LLSquare-7B-Instruct is a 7.61 billion parameter instruction-tuned causal language model from the LL-Square family, developed by LL-Square. It represents a significant advancement over previous iterations, focusing on enhanced performance across several key domains.
Key Capabilities & Improvements
- Enhanced Knowledge & Reasoning: Demonstrates greatly improved capabilities in coding and mathematics, leveraging specialized expert models.
- Instruction Following: Features significant improvements in adhering to instructions and generating coherent, relevant responses.
- Long Text & Structured Data Handling: Excels at generating long texts (up to 8K tokens) and understanding structured data like tables, including generating structured outputs such as JSON.
- Robust Prompt Handling: More resilient to diverse system prompts, which benefits role-play implementations and chatbot condition-setting.
- Extended Context Length: Supports a full context length of up to 131,072 tokens, with generation capabilities up to 8,192 tokens, utilizing techniques like YaRN for long text processing.
- Multilingual Support: Offers comprehensive support for over 29 languages, including major global languages like Chinese, English, French, Spanish, Japanese, and Korean.
Architecture & Features
This model is built on a transformer architecture incorporating RoPE, SwiGLU, RMSNorm, and Attention QKV bias. It has 28 layers and 28 attention heads (with 4 for KV in GQA configuration). The model is designed for efficient processing within the Hugging Face transformers library, with specific recommendations for using the latest versions.