maritaca-ai/sabia-7b

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Nov 8, 2023Architecture:Transformer0.1K Cold

Sabiá-7B is a 7 billion parameter auto-regressive language model developed by Maritaca AI, built on the LLaMA-1-7B architecture. It was pretrained on 7 billion tokens from the Portuguese subset of ClueWeb22, with further training on an additional 10 billion tokens. This model is specifically optimized for Portuguese language tasks and is recommended for few-shot applications due to its pretraining without instruction-tuning.

Loading preview...

Sabiá-7B: A Portuguese-Optimized LLaMA-1 Model

Sabiá-7B is a 7 billion parameter auto-regressive language model developed by Maritaca AI, leveraging the LLaMA-1-7B architecture and tokenizer. It was extensively pretrained on a 7 billion token Portuguese subset of ClueWeb22, followed by an additional 10 billion tokens of training, making it highly specialized for the Portuguese language.

Key Capabilities & Characteristics

  • Portuguese Language Focus: Specifically designed and trained for Portuguese, demonstrating strong performance on Portuguese benchmarks like Poeta, where it outperforms LLaMA-1-7B and LLaMA-2-7B.
  • LLaMA-1 Architecture: Utilizes the LLaMA-1-7B architecture and tokenizer, providing a familiar base for developers.
  • Few-shot Learning: Recommended for few-shot tasks rather than zero-shot, as it was trained solely on a language modeling objective without instruction-tuning.
  • Text-only: Accepts and generates text-only input and output.
  • Research Use Only: Licensed under the same restrictions as LLaMA-1, limiting its use to research purposes.

Performance Highlights

Sabiá-7B shows superior performance on Portuguese benchmarks compared to its LLaMA counterparts. For instance, on the Poeta benchmark, Sabiá-7B achieved an NPM of 48.5, surpassing LLaMA-1-7B (33.0) and LLaMA-2-7B (43.7). While optimized for Portuguese, its performance on English datasets is comparable to LLaMA-1-7B.

Usage Recommendation

This model is ideal for researchers and developers working on Portuguese natural language processing tasks who require a robust base model for few-shot applications. Users should be aware of its research-only license and the need for few-shot prompting due to the lack of instruction-tuning.