cfierro/llama-3.1-8b-fft-othello-snake-fixed-prefix-2e-5
The cfierro/llama-3.1-8b-fft-othello-snake-fixed-prefix-2e-5 is an 8 billion parameter Llama 3.1-based causal language model, fine-tuned by cfierro. This model is specifically optimized for playing Othello and Snake games, utilizing a unique input_output masking strategy during training. It excels at generating game moves based on a fixed prefix, demonstrating specialized knowledge in these game environments.
Loading preview...
Model Overview
This model, cfierro/llama-3.1-8b-fft-othello-snake-fixed-prefix-2e-5, is a specialized 8 billion parameter variant of Meta's Llama 3.1-8B-Instruct. It has undergone full fine-tuning (FFT) by cfierro, focusing on game-playing capabilities for Othello and Snake.
Key Capabilities
- Game-Specific Intelligence: Trained on a unique dataset (
cfierro/othello-snake-llama3-fixed-prefix) comprising 50,000 game instances, enabling it to generate moves for Othello and Snake. - Fixed Prefix Training: Utilizes an
input_outputmasking strategy where a constant game prefix is provided as context (loss-masked), and the model is trained to predict subsequent game moves and theend_of_texttoken. - Full Fine-Tuning: Unlike LoRA, this model was fully fine-tuned, allowing for comprehensive adaptation of its weights to the specific game-playing task.
- Efficient Training: Employed techniques like sample packing for short game sequences and DeepSpeed ZeRO Stage 3 for multi-GPU training of the 8B parameter model.
Training Details
The model was trained for 3 epochs with a learning rate of 2e-05, using an AdamW_BNB 8-bit optimizer. The training involved 951 steps, achieving a final validation loss of 0.2668. The training process leveraged Axolotl for configuration and management.
Intended Uses
- Game AI Research: Ideal for researchers exploring specialized language models for game playing, particularly in board games like Othello and sequential decision-making tasks like Snake.
- Demonstrations: Can be used to demonstrate how LLMs can be fine-tuned for highly specific, structured tasks beyond general conversational abilities.
Limitations
As a highly specialized model, its utility is primarily confined to the Othello and Snake game domains it was trained on. It is not intended for general-purpose language generation or instruction following outside of these specific contexts.