cfierro/llama-3.1-8b-fft-othello-snake-fixed-prefix-2e-5

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 9, 2026License:llama3.1Architecture:Transformer Cold

The cfierro/llama-3.1-8b-fft-othello-snake-fixed-prefix-2e-5 is an 8 billion parameter Llama 3.1-based causal language model, fine-tuned by cfierro. This model is specifically optimized for playing Othello and Snake games, utilizing a unique input_output masking strategy during training. It excels at generating game moves based on a fixed prefix, demonstrating specialized knowledge in these game environments.

Loading preview...

Model Overview

This model, cfierro/llama-3.1-8b-fft-othello-snake-fixed-prefix-2e-5, is a specialized 8 billion parameter variant of Meta's Llama 3.1-8B-Instruct. It has undergone full fine-tuning (FFT) by cfierro, focusing on game-playing capabilities for Othello and Snake.

Key Capabilities

  • Game-Specific Intelligence: Trained on a unique dataset (cfierro/othello-snake-llama3-fixed-prefix) comprising 50,000 game instances, enabling it to generate moves for Othello and Snake.
  • Fixed Prefix Training: Utilizes an input_output masking strategy where a constant game prefix is provided as context (loss-masked), and the model is trained to predict subsequent game moves and the end_of_text token.
  • Full Fine-Tuning: Unlike LoRA, this model was fully fine-tuned, allowing for comprehensive adaptation of its weights to the specific game-playing task.
  • Efficient Training: Employed techniques like sample packing for short game sequences and DeepSpeed ZeRO Stage 3 for multi-GPU training of the 8B parameter model.

Training Details

The model was trained for 3 epochs with a learning rate of 2e-05, using an AdamW_BNB 8-bit optimizer. The training involved 951 steps, achieving a final validation loss of 0.2668. The training process leveraged Axolotl for configuration and management.

Intended Uses

  • Game AI Research: Ideal for researchers exploring specialized language models for game playing, particularly in board games like Othello and sequential decision-making tasks like Snake.
  • Demonstrations: Can be used to demonstrate how LLMs can be fine-tuned for highly specific, structured tasks beyond general conversational abilities.

Limitations

As a highly specialized model, its utility is primarily confined to the Othello and Snake game domains it was trained on. It is not intended for general-purpose language generation or instruction following outside of these specific contexts.