DeepAuto-AI/ldm_soup_Llama-3.1-8B-Inst

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Sep 16, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

DeepAutoAI/ldm_soup_Llama-3.1-8B-Inst is an 8 billion parameter instruction-tuned language model developed by deepAuto.ai, built upon VAGOsolutions/Llama-3.1-SauerkrautLM-8B-Instruct. It utilizes a latent diffusion model to optimize pretrained weights for improved performance on Winogrande and ARC-Challenge datasets. This model employs a model-soup averaging technique to merge optimal weight configurations, resulting in enhanced performance on unseen leaderboard tasks without additional task-specific training. It is designed for general language understanding and reasoning tasks, demonstrating improved zero-shot capabilities.

Loading preview...

Model Overview

DeepAutoAI/ldm_soup_Llama-3.1-8B-Inst is an 8 billion parameter instruction-tuned model developed by deepAuto.ai. It is based on the VAGOsolutions/Llama-3.1-SauerkrautLM-8B-Instruct architecture and introduces a novel approach to weight optimization.

Key Capabilities & Methodology

  • Latent Diffusion Model for Weight Optimization: The model leverages a latent diffusion model to learn the distribution of the base model's weight space. This allows for the exploration and sampling of optimal weight configurations.
  • Model-Soup Averaging: After sampling multiple sets of weights, a model-soup averaging technique is applied. This method linearly interpolates and merges the best-performing weights, specifically optimized for the Winogrande and ARC-Challenge datasets.
  • Improved Zero-Shot Performance: This unique training methodology has led to enhanced performance on previously unseen leaderboard tasks, as evidenced by its evaluation results, without requiring any additional task-specific fine-tuning.

Evaluation Highlights

Evaluations on the Open LLM Leaderboard show an average score of 28.64. Notable scores include 80.33 for IFEval (0-Shot) and 31.10 for BBH (3-Shot), indicating strong general reasoning and instruction following capabilities. The underlying methodology is inspired by research on Diffusion-Based Neural Network Weights Generation.