mjf-su/NewSFTModel

VISIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:May 17, 2026Architecture:Transformer Cold

The mjf-su/NewSFTModel is a 4 billion parameter language model fine-tuned using the TRL framework. It is designed for general text generation tasks, leveraging a 32768 token context length. This model focuses on demonstrating capabilities derived from Supervised Fine-Tuning (SFT) on a base model.

Loading preview...

Overview

The mjf-su/NewSFTModel is a 4 billion parameter language model developed by mjf-su. It has been fine-tuned using the TRL (Transformers Reinforcement Learning) library, indicating a focus on supervised fine-tuning (SFT) techniques. The model supports a substantial context length of 32768 tokens, allowing for processing and generating longer sequences of text.

Key Capabilities

  • Text Generation: Capable of generating human-like text based on given prompts.
  • Supervised Fine-Tuning (SFT): Built upon a base model through SFT, suggesting an emphasis on learning from labeled data.
  • Extended Context Window: Features a 32768 token context length, beneficial for tasks requiring extensive contextual understanding.

Training Details

The model's training procedure involved Supervised Fine-Tuning (SFT) and utilized several key frameworks:

  • TRL: 1.4.0
  • Transformers: 4.57.6
  • Pytorch: 2.10.0
  • Datasets: 4.8.5
  • Tokenizers: 0.22.2

Training progress and metrics can be visualized via Weights & Biases, as indicated in the original model card. This model is suitable for developers exploring SFT methodologies and requiring a model with a large context window for various text-based applications.