Cabra 7B: A Portuguese LLaMA 2 Fine-tune
Cabra 7B is a 7 billion parameter model developed by botbotrobotics, created through a QLoRA fine-tuning process on the established LLaMA 2 7B Chat architecture. Its primary distinction lies in its training data: it leverages the PortugueseDolly dataset, which is a Portuguese translation of the original Databricks Dolly 15k dataset.
Key Characteristics
- Base Model: LLaMA 2 7B Chat
- Fine-tuning Method: QLoRA
- Training Data: PortugueseDolly dataset, focusing on Portuguese language instruction following.
- Context Length: 4096 tokens.
Intended Use and Limitations
This model is explicitly designated for demonstration and research purposes only, with commercial use prohibited. The developers note that the model requires further training and may generate inaccuracies or false information. Users should be aware of these limitations, particularly regarding factual correctness.
Good For
- Portuguese Language Research: Exploring the performance of LLaMA 2 fine-tunes on Portuguese instruction datasets.
- Demonstrations: Showcasing basic conversational and instruction-following capabilities in Portuguese.
- Further Development: Serving as a base for additional fine-tuning or experimentation in Portuguese NLP tasks.