Name: cfierro/llama-3.1-8b-fft-othello-snake-fixed-prefix-2e-5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cfierro

Model Overview

This model, cfierro/llama-3.1-8b-fft-othello-snake-fixed-prefix-2e-5, is a specialized 8 billion parameter variant of Meta's Llama 3.1-8B-Instruct. It has undergone full fine-tuning (FFT) by cfierro, focusing on game-playing capabilities for Othello and Snake.

Key Capabilities

Game-Specific Intelligence: Trained on a unique dataset (cfierro/othello-snake-llama3-fixed-prefix) comprising 50,000 game instances, enabling it to generate moves for Othello and Snake.
Fixed Prefix Training: Utilizes an input_output masking strategy where a constant game prefix is provided as context (loss-masked), and the model is trained to predict subsequent game moves and the end_of_text token.
Full Fine-Tuning: Unlike LoRA, this model was fully fine-tuned, allowing for comprehensive adaptation of its weights to the specific game-playing task.
Efficient Training: Employed techniques like sample packing for short game sequences and DeepSpeed ZeRO Stage 3 for multi-GPU training of the 8B parameter model.

Training Details

The model was trained for 3 epochs with a learning rate of 2e-05, using an AdamW_BNB 8-bit optimizer. The training involved 951 steps, achieving a final validation loss of 0.2668. The training process leveraged Axolotl for configuration and management.

Intended Uses

Game AI Research: Ideal for researchers exploring specialized language models for game playing, particularly in board games like Othello and sequential decision-making tasks like Snake.
Demonstrations: Can be used to demonstrate how LLMs can be fine-tuned for highly specific, structured tasks beyond general conversational abilities.

Limitations

As a highly specialized model, its utility is primarily confined to the Othello and Snake game domains it was trained on. It is not intended for general-purpose language generation or instruction following outside of these specific contexts.

Overview

Model Overview

Key Capabilities

Training Details

Intended Uses

Limitations

Full Model Card (README)