baya1116/Phase15-DeepSeek-FFT
baya1116/Phase15-DeepSeek-FFT is a 1.1 billion parameter language model based on TinyLlama, enhanced with a 75.8 million parameter HyperNetwork. This model employs a novel recursive soft-prompting mechanism, generating 128-token soft prompts from queries and raw tokens. It is specifically designed for efficient fine-tuning through per-chunk SFT, making it suitable for tasks requiring adaptive context generation.
Loading preview...
Model Overview
baya1116/Phase15-DeepSeek-FFT is a 1.1 billion parameter language model built upon the TinyLlama-1.1B architecture. Its core innovation lies in the integration of a 75.8 million parameter HyperNetwork that dynamically generates 128-token soft prompts. This HyperNetwork processes queries and raw tokens to recursively produce soft prompts, which are then prepended to the LLM's input, allowing for adaptive context generation.
Key Architectural Features
- TinyLlama-1.1B Base: Utilizes the TinyLlama-1.1B as its foundational large language model, undergoing a full fine-tune in its second training stage.
- HyperNetwork Integration: A smaller, dedicated network responsible for creating dynamic soft prompts.
- Recursive Soft Prompting: Soft prompts (
sp) are generated iteratively, wheresp_{c+1} = HyperNet(sp_c, prev_chunk_tokens), enabling context to evolve with the input. - Chunk-based SFT: Training is performed using per-chunk Supervised Fine-Tuning (SFT), with answers split into 16-token chunks, and standard teacher forcing applied within each chunk.
Training Details
The model employs specific hyperparameters for training, including a batch size of 64 (effective 64 with gradient accumulation), learning rates of 1e-4 for the HyperNetwork and 3e-5 for the LLM, and utilizes bfloat16 precision with gradient checkpointing on the LLM. It processes answer tokens in 16-token chunks, with 128 soft tokens and a maximum query/answer length of 256 tokens each.
Potential Use Cases
This architecture is particularly well-suited for scenarios where dynamic and context-aware prompting is beneficial, potentially improving performance on tasks that require nuanced understanding and generation based on evolving input sequences.