thomas-yanxin/XinYuan-Qwen2.5-7B-0917

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Sep 17, 2024License:otherArchitecture:Transformer0.0K Cold

XinYuan-Qwen2.5-7B-0917 is a 7.6 billion parameter language model developed by thomas-yanxin, based on the Qwen2.5 architecture. This model is primarily designed to validate the impact of high-quality, meticulously extracted SFT data on model performance. It demonstrates strong capabilities across English, Chinese, Math, and Code benchmarks, including 73.72 on MMLU, 81.02 on C-EVAL, 82.94 on GSM8K, and 83.99 on HumanEval.

Loading preview...

XinYuan-Qwen2.5-7B-0917: Data Quality Validation Model

XinYuan-Qwen2.5-7B-0917 is a 7.6 billion parameter model built upon the Qwen2.5 architecture, developed by thomas-yanxin. Its core purpose is to demonstrate the significant impact of high-quality, meticulously governed SFT (Supervised Fine-Tuning) data on model performance. The developers emphasize that superior data quality alone can lead to substantial improvements in model results, even without complex training methodologies.

Key Capabilities & Performance

This model exhibits robust performance across a diverse set of benchmarks, indicating strong general-purpose capabilities:

  • English Language Understanding: Achieves 73.72 on MMLU, 33.04 on GPQA, 67.55 on BBH, and 91.19 on ARC-C.
  • Chinese Language Understanding: Scores 81.02 on C-EVAL and 80.06 on CMMLU.
  • Mathematical Reasoning: Demonstrates proficiency with 82.94 on GSM8K and 41.06 on MATH.
  • Code Generation: Performs well on coding tasks, scoring 50.6 on MBPP and 83.99 on HumanEval.
  • Instruction Following: Achieves 40.48 on IFEval (Prompt Strict-Acc.).

Good For

  • Research into Data Governance: Ideal for researchers and developers interested in the impact of data quality on LLM performance.
  • General-Purpose Applications: Suitable for tasks requiring strong performance in English and Chinese language understanding, mathematical problem-solving, and code generation.
  • Benchmarking: Can serve as a strong baseline model for evaluating the effectiveness of different SFT datasets and methodologies.