juhwanlee/experiment2-non-cause-v1

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 5, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The juhwanlee/experiment2-non-cause-v1 is a 7 billion parameter large language model developed by Juhwan Lee. Based on the Mistral-7B-v0.1 architecture, it incorporates Grouped-Query Attention, Sliding-Window Attention, and a Byte-fallback BPE tokenizer. This model has been fine-tuned specifically for data ordering tasks, utilizing a randomly sampled subset of the Open-Orca dataset. Its primary strength lies in its specialized optimization for data ordering, distinguishing it from general-purpose LLMs.

Loading preview...

Model Overview

The juhwanlee/experiment2-non-cause-v1 is a 7 billion parameter large language model developed by Juhwan Lee. It is built upon the Mistral-7B-v0.1 architecture, which includes advanced features like Grouped-Query Attention, Sliding-Window Attention, and a Byte-fallback BPE tokenizer. This model has undergone specific fine-tuning for data ordering tasks.

Key Capabilities

  • Specialized Fine-tuning: Optimized for data ordering, setting it apart from general-purpose language models.
  • Mistral-7B-v0.1 Base: Leverages the efficient and performant architecture of Mistral-7B-v0.1.
  • Efficient Attention Mechanisms: Incorporates Grouped-Query Attention and Sliding-Window Attention for improved performance and context handling.
  • Byte-fallback BPE tokenizer: Utilizes a robust tokenization scheme.

Training Details

The model was fine-tuned on a dataset of 100,000 samples randomly drawn from the Open-Orca dataset, specifically targeting its data ordering capabilities.

Good For

  • Research and experimentation in data ordering tasks.
  • Developers looking for a specialized model for sequence or data arrangement problems.
  • Applications requiring a fine-tuned model based on the Mistral architecture for specific data manipulation.

For more technical details, refer to the developer's GitHub.