Model Overview
shibing624/llama-3-8b-instruct-262k-chinese is an 8 billion parameter instruction-tuned model, building upon the Llama-3-8B-Instruct-262k base model. It was fine-tuned by shibing624 using the ORPO (Odds Ratio Preference Optimization) method on the shibing624/DPO-En-Zh-20k-Preference dataset, enhancing its performance in both English and Chinese.
Key Capabilities
- Extended Context Length: Supports an ultra-long context of 262,144 tokens, making it highly effective for Retrieval Augmented Generation (RAG) and processing extensive documents.
- Bilingual Support: Proficient in both Chinese and English, enabling versatile applications in multilingual environments.
- Multi-turn Conversation: Designed for engaging in multi-turn dialogues.
- Code and Reasoning: Demonstrates strong capabilities in code encoding and general reasoning tasks, particularly with English knowledge.
Considerations for Use
While offering significant advantages in context length and bilingual support, the model has certain limitations:
- Model Size: As an 8B parameter model, it may exhibit noticeable hallucinations in knowledge-based Q&A, especially concerning Chinese historical or classical knowledge, a common characteristic of Llama-based models.
- Resource Requirements: Requires substantial GPU memory for inference, with FP16/BF16 needing 18.66GB for encoding and 24.58GB for generation, and Int4 requiring 9.21GB for encoding and 14.62GB for generation.
This model is particularly well-suited for applications requiring extensive context processing in a bilingual (Chinese-English) setting, where its long context window can be fully leveraged.