Name: glenn2/LFG-1 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: glenn2

LFG-1: Multimodal Conversational Audio-Language Model

LFG-1 (Listening Fusion Gemma) is a 26 billion parameter multimodal model developed by glenn2 as a personal learning project. It uniquely integrates a Gemma 4 E2B audio encoder with a Gemma 4 26B-A4B text model using a custom-trained projection layer. This design enables the model to directly ingest and understand raw acoustic features, bypassing traditional Speech-to-Text (STT) pipelines and preserving crucial conversational elements like pacing, pauses, and tone.

Key Capabilities & Features

Native Audio Understanding: Processes raw acoustic data for nuanced conversational interaction.
Real-time Conversational Audio: Designed for immediate speech-to-response generation.
Streaming Text Output: Provides continuous text responses.
Multimodal Support: Capable of handling simultaneous audio and image inputs.
Preserved Text Reasoning: The core Gemma 4 text backbone remains frozen during audio-projection training, ensuring its original language and reasoning capabilities are maintained.
Apple Silicon Optimized: Built with the MLX framework (mlx-vlm) for efficient local execution on Mac hardware.

Intended Use & Requirements

LFG-1 is primarily intended for real-time conversational audio applications running locally on Apple Silicon. The current audio projection layer is trained exclusively on English. Due to its combined weight of approximately 48 GB, it requires a minimum of 64 GB of unified memory on Apple Silicon devices and about 50 GB of disk space. The project is actively evolving, with future plans to expand language support for the audio training data.

Overview

LFG-1: Multimodal Conversational Audio-Language Model

Key Capabilities & Features

Intended Use & Requirements

Full Model Card (README)