tanspring/attn2_47c6ce9d-9e91-4ea2-b7a7-328d5569d3cd

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:4kArchitecture:Transformer Cold

The tanspring/attn2_47c6ce9d-9e91-4ea2-b7a7-328d5569d3cd model is a 4 billion parameter instruction-tuned causal language model, fine-tuned from Microsoft's Phi-3-mini-128k-instruct architecture. This model leverages a 4096 token context length and was trained using the TRL library. It is designed for general text generation tasks, particularly those requiring instruction following capabilities.

Loading preview...

Model Overview

This model, tanspring/attn2_47c6ce9d-9e91-4ea2-b7a7-328d5569d3cd, is a 4 billion parameter language model. It is a fine-tuned variant of the microsoft/Phi-3-mini-128k-instruct base model, indicating its foundation in a compact yet capable architecture designed for instruction-following tasks. The fine-tuning process utilized the TRL library, a framework for Transformer Reinforcement Learning, specifically employing Supervised Fine-Tuning (SFT).

Key Capabilities

  • Instruction Following: Inherits and enhances the instruction-following capabilities of its Phi-3-mini base.
  • Text Generation: Capable of generating coherent and contextually relevant text based on user prompts.
  • Compact Size: At 4 billion parameters, it offers a balance between performance and computational efficiency.

Training Details

The model was trained using Supervised Fine-Tuning (SFT) with the TRL library (version 0.17.0). Other framework versions used include Transformers 4.51.3, Pytorch 2.6.0, Datasets 3.5.0, and Tokenizers 0.21.1. Further details on the training run can be visualized via Weights & Biases.

Good for

  • Applications requiring a relatively small yet capable instruction-tuned model.
  • General-purpose text generation where a Phi-3-mini-based model is suitable.
  • Developers looking for a model fine-tuned with the TRL library.