juhwanlee/gemma-7B-alpaca-case-0-2

TEXT GENERATIONConcurrency Cost:1Model Size:8.5BQuant:FP8Ctx Length:8kPublished:Mar 25, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The juhwanlee/gemma-7B-alpaca-case-0-2 is an 8.5 billion parameter large language model developed by Juhwan Lee. Based on the Mistral-7B-v0.1 architecture, it incorporates Grouped-Query Attention, Sliding-Window Attention, and a Byte-fallback BPE tokenizer. This model is specifically fine-tuned for data ordering tasks, utilizing a random sample of the Open-Orca dataset for its training.

Loading preview...

Model Overview

This model, developed by Juhwan Lee, is an 8.5 billion parameter Large Language Model (LLM) built upon the Mistral-7B-v0.1 architecture. It has been specifically fine-tuned for data ordering tasks.

Key Architectural Features

The underlying Mistral-7B-v0.1 architecture includes several notable design choices:

  • Grouped-Query Attention: Enhances efficiency and performance.
  • Sliding-Window Attention: Optimizes context handling for longer sequences.
  • Byte-fallback BPE tokenizer: Provides robust tokenization capabilities.

Training Details

The model was fine-tuned using a random sample of the Open-Orca dataset, specifically utilizing 100,000 data points for this process. This targeted fine-tuning aims to optimize its performance for specific data ordering applications.

Good For

  • Data Ordering Tasks: Its primary intended use case due to specialized fine-tuning.
  • Research and Experimentation: For developers interested in models fine-tuned on specific data ordering methodologies.

License

This model is released under the Apache License 2.0.