Karajan42/open_llama_preview_gpt4

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Karajan42/open_llama_preview_gpt4 is a 7 billion parameter Open Llama model, fine-tuned using Low Rank Adaptation (LoRA) on a high-quality dataset. This model was developed to evaluate the impact of dataset quality on partially pretrained models, demonstrating robust LLM training on consumer hardware. It is suitable for general text generation tasks, as shown by its ability to follow instructions for recipes, factual queries, and code generation.

Loading preview...

Model Overview

Karajan42/open_llama_preview_gpt4 is a 7 billion parameter Open Llama model, fine-tuned using Low Rank Adaptation (LoRA). The primary goal of this project was to assess the impact of a high-quality dataset on the fine-tuning process of a partially pretrained model, specifically demonstrating the feasibility of training a robust LLM on consumer-grade hardware.

Key Characteristics

  • Architecture: Open Llama, 7 billion parameters.
  • Fine-tuning Method: LoRA (Low Rank Adaptation) was employed to reduce memory footprint and computational requirements, enabling training on consumer hardware (e.g., 3 x RTX 3090 GPUs).
  • Training Parameters: The model was trained for 3 epochs with a learning rate of 3e-4, a batch size of 4, and 4 gradient accumulation steps. 8-bit mode was not used.
  • Dataset Focus: The project emphasizes the importance of dataset quality in fine-tuning, with plans for future iterations using commercially viable datasets.

Use Cases

This model is suitable for a variety of instruction-following tasks, including:

  • Recipe Generation: Capable of generating detailed recipes based on user prompts.
  • Factual Question Answering: Provides responses to factual queries, such as historical or political information.
  • Code Generation: Can generate simple code snippets, like Python programs for mathematical sequences.