Model Overview

This model, developed by Juhwan Lee, is an 8.5 billion parameter Large Language Model (LLM) built upon the Mistral-7B-v0.1 architecture. It has been specifically fine-tuned for data ordering tasks.

Key Architectural Features

The underlying Mistral-7B-v0.1 architecture includes several notable design choices:

Grouped-Query Attention: Enhances efficiency and performance.
Sliding-Window Attention: Optimizes context handling for longer sequences.
Byte-fallback BPE tokenizer: Provides robust tokenization capabilities.

Training Details

The model was fine-tuned using a random sample of the Open-Orca dataset, specifically utilizing 100,000 data points for this process. This targeted fine-tuning aims to optimize its performance for specific data ordering applications.

Good For

Data Ordering Tasks: Its primary intended use case due to specialized fine-tuning.
Research and Experimentation: For developers interested in models fine-tuned on specific data ordering methodologies.

License

This model is released under the Apache License 2.0.

Overview

Model Overview

Key Architectural Features

Training Details

Good For

License

Full Model Card (README)