Alepach/notHumpback-M1-Rw-F-8b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 17, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Alepach/notHumpback-M1-Rw-F-8b is an 8 billion parameter instruction-following language model based on Llama-3.1-8B, developed by Alepach. It implements a modified Humpback architecture, incorporating a novel 'self-rewriting' step before self-curation to enhance linguistic quality and data diversity. This model is specifically fine-tuned for instruction-following tasks and can be used to further refine instruction-response pairs in an iterative self-alignment pipeline.

Loading preview...

Overview

Alepach/notHumpback-M1-Rw-F-8b is an 8 billion parameter instruction-following model built upon the Llama-3.1-8B architecture. It integrates a modified version of the Humpback self-alignment pipeline, as proposed by Li et al., with an additional 'rewriting' step inspired by Nguyen et al. This model's unique approach involves a "self-rewriting" phase, performed by the seed model itself, which occurs before self-curation. This aims to improve the linguistic quality of web-sourced responses and potentially increase the diversity and quantity of high-quality training data by restructuring messy documents.

Key Capabilities

  • Instruction Following: Designed to accurately follow user instructions.
  • Self-Alignment Pipeline: Represents the first iteration of a self-alignment pipeline, trained on a combination of gold data and synthetically generated, rewritten, and curated data.
  • Data Enhancement: Utilizes a novel self-rewriting step to improve the linguistic quality of responses and potentially expand the usable dataset from web corpora like C4.

Training Details

The model was fine-tuned using TRL on a dataset combining samples from oasst1 and a synthetic dataset. The synthetic data was generated by applying self-augmentation, self-rewriting, and self-curation to 502k entries from the English subset of the c4 dataset.

Potential Use Cases

  • Instruction-following applications: Directly usable for generating responses based on given instructions.
  • Iterative Model Improvement: Can serve as the 'seed model' for subsequent iterations of the self-alignment pipeline, rewriting and scoring instruction-response pairs for further training.