Name: Menlo/llama3-s-2024-07-08 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Menlo

Model Overview

Menlo/llama3-s-2024-07-08 is an 8 billion parameter model from the llama3-s family, developed by Homebrew Research. It is built upon the Llama-3 architecture and extends the capabilities of Meta-Llama-3-8B-Instruct by integrating native audio and text understanding. The model processes both sound and text as input to generate text output.

Key Capabilities & Training

This model's primary differentiator is its multimodal input capability, specifically its ability to interpret sound. It was continually trained for 8 hours on a cluster of 8x NVIDIA H100-SXM-80GB GPUs, leveraging 700 million tokens from the Instruction Speech v1 dataset to enhance its sound understanding. The training utilized an Adam-mini optimizer with a learning rate of 5e-5 and a global batch size of 128. Despite being in early stages, the model shows an emerging grasp of sound-text semantics.

Intended Use Cases

This model family is primarily intended for research applications, particularly those focused on improving and exploring sound understanding capabilities within large language models. Users can convert audio files into sound tokens using the provided Encodec-based Python script before feeding them into the model alongside text. The model is English-only.

Overview

Model Overview

Key Capabilities & Training

Intended Use Cases

Full Model Card (README)