arcee-ai/myalee-v3-L31-8B
arcee-ai/myalee-v3-L31-8B is an 8 billion parameter instruction-tuned causal language model, fine-tuned from Crystalcareai/Meta-llama-3.1-8b-instruct. This model leverages a 32768 token context length and was trained using Axolotl, incorporating datasets like Alpaca and mlabonne/FineTome-100k. It is designed for general instruction-following tasks, building upon the robust capabilities of the Llama 3.1 architecture.
Loading preview...
Model Overview
arcee-ai/myalee-v3-L31-8B is an 8 billion parameter instruction-tuned language model, developed by arcee-ai. It is a fine-tuned variant of the Crystalcareai/Meta-llama-3.1-8b-instruct base model, built using the Axolotl framework.
Key Training Details
This model was trained with a focus on instruction-following, utilizing a combination of datasets including /workspace/data/myalee (Alpaca format) and mlabonne/FineTome-100k (ShareGPT format). Key training hyperparameters include:
- Learning Rate: 2e-05
- Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
- Epochs: 4
- Gradient Accumulation Steps: 8
- Sequence Length: 8192 (with sample packing enabled)
- Flash Attention: Enabled for efficiency
Architectural Enhancements
The fine-tuning process involved unfreezing specific layers, including lm_head.weight, model.embed_tokens.weight, and various input_layernorm, mlp (down_proj, gate_proj, up_proj), post_attention_layernorm, and self_attn (k_proj, o_proj, q_proj, v_proj) layers across different model blocks. This selective unfreezing aims to enhance the model's adaptability to new data while preserving the foundational knowledge of the Llama 3.1 base.
Intended Use
While specific intended uses and limitations are not detailed in the provided README, as an instruction-tuned model based on Llama 3.1, it is generally suitable for a wide range of natural language processing tasks requiring conversational abilities, text generation, summarization, and question answering.