Name: Menlo/llama3-s-v0.1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Menlo

Overview

Menlo/llama3-s-v0.1 is an 8 billion parameter model built on the Llama-3 architecture by Homebrew Research. This model is uniquely designed to process and understand both text and audio inputs, generating text outputs. It represents a continuation of the llama3s family, specifically enhancing sound understanding capabilities by training on an additional 1.3 billion tokens from the Instruction Speech v1.5 dataset.

Key Capabilities

Multimodal Input: Natively understands and processes both text and sound inputs.
Llama-3 Architecture: Leverages the robust Llama-3 base for strong language understanding.
Enhanced Sound Understanding: Continuously trained to improve its ability to interpret audio, building on previous llama3s checkpoints.
Research-Oriented: Primarily intended for research applications exploring multimodal LLMs.

Training Details

The model underwent continual training for 14 hours on a cluster of 8x NVIDIA H100-SXM-80GB GPUs. Key training arguments included a global batch size of 128, a learning rate of 1.5e-4 with a cosine scheduler, and an Adam optimizer. The training process focused on improving sound-text semantics, as evidenced by the provided training loss curve.

Good for

Multimodal Research: Ideal for researchers exploring the integration of audio and text in LLMs.
Sound-to-Text Applications: Suitable for experimental use cases requiring an LLM to respond to spoken or environmental audio cues.
Developing Audio-Aware Agents: A foundational model for building agents that can interpret and react to sound alongside textual instructions.