Zhengyi/LLaMA-Mesh

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kLicense:llama3.1Architecture:Transformer0.2K Warm

LLaMA-Mesh, developed by Zhengyi Wang et al. with base weights from Meta and fine-tuned by Nvidia, is an 8-billion parameter language model with a 32768-token context length. It unifies text and 3D mesh generation by representing mesh data as plain text, enabling conversational 3D generation and understanding. This model excels at generating 3D meshes from text prompts and interpreting 3D meshes, achieving quality comparable to models trained from scratch while maintaining strong text generation performance.

Loading preview...

LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models

LLaMA-Mesh is an 8-billion parameter model, fine-tuned from Meta's Llama 3.1, that integrates 3D mesh generation directly into a large language model. Developed by Zhengyi Wang et al. and fine-tuned by Nvidia, this model addresses the challenge of processing 3D data by representing vertex coordinates and face definitions as plain text, allowing seamless integration without expanding the LLM's vocabulary. This novel approach enables LLMs to leverage spatial knowledge from textual sources and perform conversational 3D generation.

Key Capabilities

  • Unified Text and 3D Mesh Processing: Represents 3D mesh data (vertex coordinates, face definitions) as plain text, allowing LLMs to process both modalities within a single framework.
  • Conversational 3D Generation: Enables generating 3D meshes from text prompts and producing interleaved text and 3D mesh outputs.
  • 3D Mesh Understanding: Capable of interpreting and understanding 3D meshes.
  • Performance: Achieves 3D mesh generation quality on par with models trained from scratch, while preserving strong text generation capabilities.

Training and Data

The model is fine-tuned on a supervised dataset derived from a subset of the Objaverse dataset, specifically 30,000 meshes with fewer than 500 faces, converted into text strings. This training allows the model to acquire complex spatial knowledge for 3D mesh generation in a text-based format.

Licensing

LLaMA-Mesh is distributed under the NSCLv1 License for non-commercial use and incorporates components of Llama 3.1 technology, subject to the Llama 3.1 Community License Agreement.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p