juhwanlee/llmdo-Mistral-7B-case-c-v1
The juhwanlee/llmdo-Mistral-7B-case-c-v1 is a 7 billion parameter large language model developed by Juhwan Lee, based on the Mistral-7B-v0.1 architecture. This model is specifically fine-tuned for data ordering tasks, utilizing a dataset sampled from Open-Orca. It incorporates architectural features like Grouped-Query Attention, Sliding-Window Attention, and a Byte-fallback BPE tokenizer, making it suitable for specialized data manipulation applications.
Loading preview...
Model Overview
The juhwanlee/llmdo-Mistral-7B-case-c-v1 is a 7 billion parameter Large Language Model developed by Juhwan Lee. It is built upon the Mistral-7B-v0.1 architecture, which includes advanced features such as Grouped-Query Attention, Sliding-Window Attention, and a Byte-fallback BPE tokenizer. This model has been specifically fine-tuned for data ordering tasks.
Key Capabilities & Training
The primary focus of this model is data ordering. It was fine-tuned using a 100,000-sample dataset randomly selected from the Open-Orca dataset. This specialized training aims to optimize its performance for tasks requiring structured data arrangement.
Performance Benchmarks
Evaluations on the Open LLM Leaderboard show the model's performance across various metrics:
- Avg.: 62.16
- AI2 Reasoning Challenge (25-Shot): 62.03
- HellaSwag (10-Shot): 83.55
- MMLU (5-Shot): 62.69
- TruthfulQA (0-shot): 45.82
- Winogrande (5-shot): 79.08
- GSM8k (5-shot): 39.80
Detailed results are available on the Hugging Face Open LLM Leaderboard.
When to Use This Model
This model is particularly suited for use cases that involve data ordering or require a foundational Mistral-7B model with specific fine-tuning for structured data manipulation. Its specialized training makes it a candidate for applications where the arrangement and sequencing of data are critical.