occiglot/occiglot-7b-es-en
Occiglot-7B-ES-EN is a 7 billion parameter generative language model developed by the Occiglot Research Collective. Based on Mistral-7B-v0.1, it is continually pre-trained on 112 billion tokens of additional multilingual and code data, with a block size of 8,192 tokens. This model specializes in Spanish and English, alongside code, making it a strong base model for applications requiring proficiency in these languages.
Loading preview...
Occiglot-7B-ES-EN: A Multilingual Base Model for Spanish, English, and Code
Occiglot-7B-ES-EN is a 7 billion parameter generative language model developed by the Occiglot Research Collective. It is built upon the Mistral-7B-v0.1 architecture and has undergone extensive continued pre-training on 112 billion tokens of additional data, focusing on Spanish, English, and code. The model utilizes a block size of 8,192 tokens per sample, enhancing its context understanding.
Key Capabilities
- Bilingual Proficiency: Optimized for strong performance in both Spanish and English.
- Code Understanding: Includes significant training on code data, making it suitable for code-related tasks.
- Base Model: This is a general-purpose base model, not instruction-tuned or optimized for chat out-of-the-box. An instruction-tuned variant, occiglot-7b-es-en-instruct, is available for conversational applications.
- Open Research Project: Part of an ongoing open research initiative by Occiglot, inviting collaborations for language model development and evaluation.
Training Details
The model was continually pre-trained on 128 x A100-80GB GPUs using the Determined framework, employing bf16 precision and an AdamW optimizer. The training data distribution is approximately 52% Spanish, 34% English, and 13% Code.
Good for
- Developers and researchers working on applications requiring strong Spanish and English language understanding and generation.
- Building custom instruction-tuned models for specific tasks in these languages.
- Research into multilingual language models and continued pre-training techniques.