ericflo/Llama-3.1-8B-ContinuedTraining

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Sep 5, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The ericflo/Llama-3.1-8B-ContinuedTraining model is an 8 billion parameter language model developed by Eric Florenzano, based on the Meta-Llama-3.1-8B architecture with a 32768 token context length. It features a unique high-rank (128) adapter training approach, distinguishing it from typical low-rank fine-tuning, to enhance learning capacity and mitigate catastrophic forgetting. This model is primarily optimized for general text completion, instruction following, and Python-focused code generation, trained on a diverse blend of high-quality datasets including FineTome-100k, dclm-baseline-1.0-parquet, English Wikipedia, and Starcoder.

Loading preview...

Overview

The ericflo/Llama-3.1-8B-ContinuedTraining model is an 8 billion parameter Large Language Model (LLM) developed by Eric Florenzano, built upon the Meta-Llama-3.1-8B architecture. This model stands out due to its unique training methodology, which involves direct training on a diverse mixture of high-quality datasets for general text, code completion, and instruction-following tasks, rather than fine-tuning an already instruction-tuned model. It utilizes a high-rank adapter (rank 128) to significantly enhance learning capacity and reduce catastrophic forgetting, a key differentiator from common low-rank adaptation (LoRA) methods.

Key Capabilities

  • General Text Completion and Generation: Proficient in generating and predicting text across various domains.
  • Python Code Completion: Specifically trained on the Starcoder dataset to assist with Python code generation.
  • Robust Instruction Following: Capable of understanding and executing complex instructions, trained with alternating ChatML and Llama Chat templates for broad applicability.
  • Broad Language Understanding: Benefits from diverse training data, including English Wikipedia and Apple's dclm-baseline-1.0-parquet, for comprehensive language comprehension.

Good for

  • Developers seeking a Llama-3.1-8B variant with enhanced learning capacity for multi-task scenarios.
  • Applications requiring strong performance in both general instruction following and Python code generation.
  • Use cases where mitigating catastrophic forgetting during continued training is critical.
  • Tasks involving text generation, code assistance, and complex instruction processing.