Name: occiglot/occiglot-7b-es-en-instruct API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: occiglot

Occiglot-7B-ES-EN-Instruct Overview

Occiglot-7B-ES-EN-Instruct is a 7 billion parameter instruction-tuned language model developed by the Occiglot Research Collective. It is specifically designed to support both Spanish and English, alongside code generation, making it a polyglot model for Western languages. This model is an instruct version, fine-tuned from the occiglot-7b-es-en base model with an additional 160 million tokens of multilingual and code instructions.

Key Capabilities & Features

Bilingual Proficiency: Strong performance in both Spanish and English, as evidenced by evaluation results across various benchmarks.
Instruction Following: Trained with the chatml instruction template, enabling effective interaction through conversational prompts.
Code Support: Includes code in its training data, suggesting capabilities for code-related tasks.
Research-Oriented: Part of an ongoing open research project, with an invitation for collaborations on multilingual language models and evaluations.

Training & Data

The model underwent full instruction fine-tuning using axolotl on 8xH100 GPUs. Its training data was evenly split between Spanish and English, incorporating datasets like Open-Hermes-2.5 (English/Code) and Mentor-ES, Squad-es, OASST-2 (Spanish subset), and Aya-Dataset (Spanish subset) for Spanish.

Important Considerations

Safety Alignment: The model was not safety aligned and may produce problematic outputs.
Evaluation Nuances: Preliminary evaluation results, especially for non-English languages, are based on partially machine-translated datasets and English prompts, and should be interpreted with caution.

Overview

Occiglot-7B-ES-EN-Instruct Overview

Key Capabilities & Features

Training & Data

Important Considerations

Full Model Card (README)