Model Overview
Karajan42/open_llama_preview_gpt4 is a 7 billion parameter Open Llama model, fine-tuned using Low Rank Adaptation (LoRA). The primary goal of this project was to assess the impact of a high-quality dataset on the fine-tuning process of a partially pretrained model, specifically demonstrating the feasibility of training a robust LLM on consumer-grade hardware.
Key Characteristics
- Architecture: Open Llama, 7 billion parameters.
- Fine-tuning Method: LoRA (Low Rank Adaptation) was employed to reduce memory footprint and computational requirements, enabling training on consumer hardware (e.g., 3 x RTX 3090 GPUs).
- Training Parameters: The model was trained for 3 epochs with a learning rate of 3e-4, a batch size of 4, and 4 gradient accumulation steps. 8-bit mode was not used.
- Dataset Focus: The project emphasizes the importance of dataset quality in fine-tuning, with plans for future iterations using commercially viable datasets.
Use Cases
This model is suitable for a variety of instruction-following tasks, including:
- Recipe Generation: Capable of generating detailed recipes based on user prompts.
- Factual Question Answering: Provides responses to factual queries, such as historical or political information.
- Code Generation: Can generate simple code snippets, like Python programs for mathematical sequences.