occiglot/occiglot-7b-es-en

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 19, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Occiglot-7B-ES-EN is a 7 billion parameter generative language model developed by the Occiglot Research Collective. Based on Mistral-7B-v0.1, it is continually pre-trained on 112 billion tokens of additional multilingual and code data, with a block size of 8,192 tokens. This model specializes in Spanish and English, alongside code, making it a strong base model for applications requiring proficiency in these languages.

Loading preview...

Occiglot-7B-ES-EN: A Multilingual Base Model for Spanish, English, and Code

Occiglot-7B-ES-EN is a 7 billion parameter generative language model developed by the Occiglot Research Collective. It is built upon the Mistral-7B-v0.1 architecture and has undergone extensive continued pre-training on 112 billion tokens of additional data, focusing on Spanish, English, and code. The model utilizes a block size of 8,192 tokens per sample, enhancing its context understanding.

Key Capabilities

  • Bilingual Proficiency: Optimized for strong performance in both Spanish and English.
  • Code Understanding: Includes significant training on code data, making it suitable for code-related tasks.
  • Base Model: This is a general-purpose base model, not instruction-tuned or optimized for chat out-of-the-box. An instruction-tuned variant, occiglot-7b-es-en-instruct, is available for conversational applications.
  • Open Research Project: Part of an ongoing open research initiative by Occiglot, inviting collaborations for language model development and evaluation.

Training Details

The model was continually pre-trained on 128 x A100-80GB GPUs using the Determined framework, employing bf16 precision and an AdamW optimizer. The training data distribution is approximately 52% Spanish, 34% English, and 13% Code.

Good for

  • Developers and researchers working on applications requiring strong Spanish and English language understanding and generation.
  • Building custom instruction-tuned models for specific tasks in these languages.
  • Research into multilingual language models and continued pre-training techniques.