TristanBehrens/bachinstruct-codellama7b

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 26, 2024License:llama2Architecture:Transformer0.0K Open Weights Cold

TristanBehrens/bachinstruct-codellama7b is a 7 billion parameter CodeLlama-based causal language model fine-tuned by TristanBehrens. This model is specifically adapted from codellama/CodeLlama-7b-hf, leveraging its 4096-token context length. It is optimized for instruction-following tasks, particularly those related to code generation and understanding, building upon the foundational capabilities of CodeLlama.

Loading preview...

Overview

TristanBehrens/bachinstruct-codellama7b is a 7 billion parameter instruction-tuned model, fine-tuned by TristanBehrens. It is based on the codellama/CodeLlama-7b-hf architecture, which provides a strong foundation for code-related tasks. The model was trained using axolotl version 0.4.0, with specific configurations for LoRA adaptation.

Key Capabilities

  • Code-centric Instruction Following: Fine-tuned on the TristanBehrens/bachinstruct dataset, suggesting an emphasis on understanding and generating responses based on code-related instructions.
  • Efficient Deployment: Utilizes 8-bit quantization (load_in_8bit: true) for potentially reduced memory footprint during inference.
  • LoRA Fine-tuning: Employs LoRA (Low-Rank Adaptation) with r=32 and alpha=16, indicating an efficient fine-tuning approach that preserves the base model's knowledge while adapting to new instructions.
  • Context Length: Inherits the 4096-token sequence length from its CodeLlama base, suitable for handling moderately long code snippets or conversational turns.

Training Details

The model underwent training for 1 epoch with a learning rate of 0.0002, using an AdamW optimizer. It leveraged gradient accumulation steps of 4 and a micro batch size of 16, resulting in a total train batch size of 128. The training utilized bf16 precision and flash_attention for efficiency.